Boosting Box-supervised Instance Segmentation with Pseudo Depth (2403.01214v1)
Abstract: The realm of Weakly Supervised Instance Segmentation (WSIS) under box supervision has garnered substantial attention, showcasing remarkable advancements in recent years. However, the limitations of box supervision become apparent in its inability to furnish effective information for distinguishing foreground from background within the specified target box. This research addresses this challenge by introducing pseudo-depth maps into the training process of the instance segmentation network, thereby boosting its performance by capturing depth differences between instances. These pseudo-depth maps are generated using a readily available depth predictor and are not necessary during the inference stage. To enable the network to discern depth features when predicting masks, we integrate a depth prediction layer into the mask prediction head. This innovative approach empowers the network to simultaneously predict masks and depth, enhancing its ability to capture nuanced depth-related information during the instance segmentation process. We further utilize the mask generated in the training process as supervision to distinguish the foreground from the background. When selecting the best mask for each box through the Hungarian algorithm, we use depth consistency as one calculation cost item. The proposed method achieves significant improvements on Cityscapes and COCO dataset.
- Zhou, D., Fang, J., Song, X., Liu, L., Yin, J., Dai, Y., Li, H., Yang, R.: Joint 3d instance segmentation and object detection for autonomous driving. In: CVPR, pp. 1839–1849 (2020) Feng et al. [2020] Feng, D., Haase-Schütz, C., Rosenbaum, L., Hertlein, H., Glaeser, C., Timm, F., Wiesbeck, W., Dietmayer, K.: Deep multi-modal object detection and semantic segmentation for autonomous driving: Datasets, methods, and challenges. IEEE Transactions on Intelligent Transportation Systems 22(3), 1341–1360 (2020) Minaee et al. [2021] Minaee, S., Boykov, Y.Y., Porikli, F., Plaza, A.J., Kehtarnavaz, N., Terzopoulos, D.: Image segmentation using deep learning: A survey. PAMI (2021) Ahn et al. [2019] Ahn, J., Cho, S., Kwak, S.: Weakly supervised learning of instance segmentation with inter-pixel relations. In: CVPR, pp. 2209–2218 (2019) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Tian et al. [2020] Tian, Z., Shen, C., Chen, H., He, T.: Fcos: A simple and strong anchor-free object detector. PAMI 44(4), 1922–1933 (2020) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020) Lin et al. [2014] Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: ECCV, pp. 740–755 (2014). Springer Gupta et al. [2019] Gupta, A., Dollar, P., Girshick, R.: Lvis: A dataset for large vocabulary instance segmentation. In: CVPR, pp. 5356–5364 (2019) Chen et al. [2019] Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019) Wang et al. [2020] Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Feng, D., Haase-Schütz, C., Rosenbaum, L., Hertlein, H., Glaeser, C., Timm, F., Wiesbeck, W., Dietmayer, K.: Deep multi-modal object detection and semantic segmentation for autonomous driving: Datasets, methods, and challenges. IEEE Transactions on Intelligent Transportation Systems 22(3), 1341–1360 (2020) Minaee et al. [2021] Minaee, S., Boykov, Y.Y., Porikli, F., Plaza, A.J., Kehtarnavaz, N., Terzopoulos, D.: Image segmentation using deep learning: A survey. PAMI (2021) Ahn et al. [2019] Ahn, J., Cho, S., Kwak, S.: Weakly supervised learning of instance segmentation with inter-pixel relations. In: CVPR, pp. 2209–2218 (2019) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Tian et al. [2020] Tian, Z., Shen, C., Chen, H., He, T.: Fcos: A simple and strong anchor-free object detector. PAMI 44(4), 1922–1933 (2020) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020) Lin et al. [2014] Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: ECCV, pp. 740–755 (2014). Springer Gupta et al. [2019] Gupta, A., Dollar, P., Girshick, R.: Lvis: A dataset for large vocabulary instance segmentation. In: CVPR, pp. 5356–5364 (2019) Chen et al. [2019] Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019) Wang et al. [2020] Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Minaee, S., Boykov, Y.Y., Porikli, F., Plaza, A.J., Kehtarnavaz, N., Terzopoulos, D.: Image segmentation using deep learning: A survey. PAMI (2021) Ahn et al. [2019] Ahn, J., Cho, S., Kwak, S.: Weakly supervised learning of instance segmentation with inter-pixel relations. In: CVPR, pp. 2209–2218 (2019) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Tian et al. [2020] Tian, Z., Shen, C., Chen, H., He, T.: Fcos: A simple and strong anchor-free object detector. PAMI 44(4), 1922–1933 (2020) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020) Lin et al. [2014] Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: ECCV, pp. 740–755 (2014). Springer Gupta et al. [2019] Gupta, A., Dollar, P., Girshick, R.: Lvis: A dataset for large vocabulary instance segmentation. In: CVPR, pp. 5356–5364 (2019) Chen et al. [2019] Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019) Wang et al. [2020] Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Ahn, J., Cho, S., Kwak, S.: Weakly supervised learning of instance segmentation with inter-pixel relations. In: CVPR, pp. 2209–2218 (2019) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Tian et al. [2020] Tian, Z., Shen, C., Chen, H., He, T.: Fcos: A simple and strong anchor-free object detector. PAMI 44(4), 1922–1933 (2020) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020) Lin et al. [2014] Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: ECCV, pp. 740–755 (2014). Springer Gupta et al. [2019] Gupta, A., Dollar, P., Girshick, R.: Lvis: A dataset for large vocabulary instance segmentation. In: CVPR, pp. 5356–5364 (2019) Chen et al. [2019] Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019) Wang et al. [2020] Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Tian et al. [2020] Tian, Z., Shen, C., Chen, H., He, T.: Fcos: A simple and strong anchor-free object detector. PAMI 44(4), 1922–1933 (2020) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020) Lin et al. [2014] Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: ECCV, pp. 740–755 (2014). Springer Gupta et al. [2019] Gupta, A., Dollar, P., Girshick, R.: Lvis: A dataset for large vocabulary instance segmentation. In: CVPR, pp. 5356–5364 (2019) Chen et al. [2019] Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019) Wang et al. [2020] Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tian, Z., Shen, C., Chen, H., He, T.: Fcos: A simple and strong anchor-free object detector. PAMI 44(4), 1922–1933 (2020) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020) Lin et al. [2014] Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: ECCV, pp. 740–755 (2014). Springer Gupta et al. [2019] Gupta, A., Dollar, P., Girshick, R.: Lvis: A dataset for large vocabulary instance segmentation. In: CVPR, pp. 5356–5364 (2019) Chen et al. [2019] Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019) Wang et al. [2020] Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020) Lin et al. [2014] Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: ECCV, pp. 740–755 (2014). Springer Gupta et al. [2019] Gupta, A., Dollar, P., Girshick, R.: Lvis: A dataset for large vocabulary instance segmentation. In: CVPR, pp. 5356–5364 (2019) Chen et al. [2019] Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019) Wang et al. [2020] Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: ECCV, pp. 740–755 (2014). Springer Gupta et al. [2019] Gupta, A., Dollar, P., Girshick, R.: Lvis: A dataset for large vocabulary instance segmentation. In: CVPR, pp. 5356–5364 (2019) Chen et al. [2019] Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019) Wang et al. [2020] Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Gupta, A., Dollar, P., Girshick, R.: Lvis: A dataset for large vocabulary instance segmentation. In: CVPR, pp. 5356–5364 (2019) Chen et al. [2019] Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019) Wang et al. [2020] Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019) Wang et al. [2020] Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021)
- Feng, D., Haase-Schütz, C., Rosenbaum, L., Hertlein, H., Glaeser, C., Timm, F., Wiesbeck, W., Dietmayer, K.: Deep multi-modal object detection and semantic segmentation for autonomous driving: Datasets, methods, and challenges. IEEE Transactions on Intelligent Transportation Systems 22(3), 1341–1360 (2020) Minaee et al. [2021] Minaee, S., Boykov, Y.Y., Porikli, F., Plaza, A.J., Kehtarnavaz, N., Terzopoulos, D.: Image segmentation using deep learning: A survey. PAMI (2021) Ahn et al. [2019] Ahn, J., Cho, S., Kwak, S.: Weakly supervised learning of instance segmentation with inter-pixel relations. In: CVPR, pp. 2209–2218 (2019) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Tian et al. [2020] Tian, Z., Shen, C., Chen, H., He, T.: Fcos: A simple and strong anchor-free object detector. PAMI 44(4), 1922–1933 (2020) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020) Lin et al. [2014] Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: ECCV, pp. 740–755 (2014). Springer Gupta et al. [2019] Gupta, A., Dollar, P., Girshick, R.: Lvis: A dataset for large vocabulary instance segmentation. In: CVPR, pp. 5356–5364 (2019) Chen et al. [2019] Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019) Wang et al. [2020] Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Minaee, S., Boykov, Y.Y., Porikli, F., Plaza, A.J., Kehtarnavaz, N., Terzopoulos, D.: Image segmentation using deep learning: A survey. PAMI (2021) Ahn et al. [2019] Ahn, J., Cho, S., Kwak, S.: Weakly supervised learning of instance segmentation with inter-pixel relations. In: CVPR, pp. 2209–2218 (2019) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Tian et al. [2020] Tian, Z., Shen, C., Chen, H., He, T.: Fcos: A simple and strong anchor-free object detector. PAMI 44(4), 1922–1933 (2020) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020) Lin et al. [2014] Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: ECCV, pp. 740–755 (2014). Springer Gupta et al. [2019] Gupta, A., Dollar, P., Girshick, R.: Lvis: A dataset for large vocabulary instance segmentation. In: CVPR, pp. 5356–5364 (2019) Chen et al. [2019] Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019) Wang et al. [2020] Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Ahn, J., Cho, S., Kwak, S.: Weakly supervised learning of instance segmentation with inter-pixel relations. In: CVPR, pp. 2209–2218 (2019) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Tian et al. [2020] Tian, Z., Shen, C., Chen, H., He, T.: Fcos: A simple and strong anchor-free object detector. PAMI 44(4), 1922–1933 (2020) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020) Lin et al. [2014] Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: ECCV, pp. 740–755 (2014). Springer Gupta et al. [2019] Gupta, A., Dollar, P., Girshick, R.: Lvis: A dataset for large vocabulary instance segmentation. In: CVPR, pp. 5356–5364 (2019) Chen et al. [2019] Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019) Wang et al. [2020] Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Tian et al. [2020] Tian, Z., Shen, C., Chen, H., He, T.: Fcos: A simple and strong anchor-free object detector. PAMI 44(4), 1922–1933 (2020) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020) Lin et al. [2014] Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: ECCV, pp. 740–755 (2014). Springer Gupta et al. [2019] Gupta, A., Dollar, P., Girshick, R.: Lvis: A dataset for large vocabulary instance segmentation. In: CVPR, pp. 5356–5364 (2019) Chen et al. [2019] Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019) Wang et al. [2020] Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tian, Z., Shen, C., Chen, H., He, T.: Fcos: A simple and strong anchor-free object detector. PAMI 44(4), 1922–1933 (2020) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020) Lin et al. [2014] Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: ECCV, pp. 740–755 (2014). Springer Gupta et al. [2019] Gupta, A., Dollar, P., Girshick, R.: Lvis: A dataset for large vocabulary instance segmentation. In: CVPR, pp. 5356–5364 (2019) Chen et al. [2019] Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019) Wang et al. [2020] Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020) Lin et al. [2014] Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: ECCV, pp. 740–755 (2014). Springer Gupta et al. [2019] Gupta, A., Dollar, P., Girshick, R.: Lvis: A dataset for large vocabulary instance segmentation. In: CVPR, pp. 5356–5364 (2019) Chen et al. [2019] Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019) Wang et al. [2020] Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: ECCV, pp. 740–755 (2014). Springer Gupta et al. [2019] Gupta, A., Dollar, P., Girshick, R.: Lvis: A dataset for large vocabulary instance segmentation. In: CVPR, pp. 5356–5364 (2019) Chen et al. [2019] Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019) Wang et al. [2020] Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Gupta, A., Dollar, P., Girshick, R.: Lvis: A dataset for large vocabulary instance segmentation. In: CVPR, pp. 5356–5364 (2019) Chen et al. [2019] Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019) Wang et al. [2020] Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019) Wang et al. [2020] Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021)
- Minaee, S., Boykov, Y.Y., Porikli, F., Plaza, A.J., Kehtarnavaz, N., Terzopoulos, D.: Image segmentation using deep learning: A survey. PAMI (2021) Ahn et al. [2019] Ahn, J., Cho, S., Kwak, S.: Weakly supervised learning of instance segmentation with inter-pixel relations. In: CVPR, pp. 2209–2218 (2019) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Tian et al. [2020] Tian, Z., Shen, C., Chen, H., He, T.: Fcos: A simple and strong anchor-free object detector. PAMI 44(4), 1922–1933 (2020) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020) Lin et al. [2014] Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: ECCV, pp. 740–755 (2014). Springer Gupta et al. [2019] Gupta, A., Dollar, P., Girshick, R.: Lvis: A dataset for large vocabulary instance segmentation. In: CVPR, pp. 5356–5364 (2019) Chen et al. [2019] Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019) Wang et al. [2020] Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Ahn, J., Cho, S., Kwak, S.: Weakly supervised learning of instance segmentation with inter-pixel relations. In: CVPR, pp. 2209–2218 (2019) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Tian et al. [2020] Tian, Z., Shen, C., Chen, H., He, T.: Fcos: A simple and strong anchor-free object detector. PAMI 44(4), 1922–1933 (2020) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020) Lin et al. [2014] Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: ECCV, pp. 740–755 (2014). Springer Gupta et al. [2019] Gupta, A., Dollar, P., Girshick, R.: Lvis: A dataset for large vocabulary instance segmentation. In: CVPR, pp. 5356–5364 (2019) Chen et al. [2019] Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019) Wang et al. [2020] Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Tian et al. [2020] Tian, Z., Shen, C., Chen, H., He, T.: Fcos: A simple and strong anchor-free object detector. PAMI 44(4), 1922–1933 (2020) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020) Lin et al. [2014] Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: ECCV, pp. 740–755 (2014). Springer Gupta et al. [2019] Gupta, A., Dollar, P., Girshick, R.: Lvis: A dataset for large vocabulary instance segmentation. In: CVPR, pp. 5356–5364 (2019) Chen et al. [2019] Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019) Wang et al. [2020] Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tian, Z., Shen, C., Chen, H., He, T.: Fcos: A simple and strong anchor-free object detector. PAMI 44(4), 1922–1933 (2020) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020) Lin et al. [2014] Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: ECCV, pp. 740–755 (2014). Springer Gupta et al. [2019] Gupta, A., Dollar, P., Girshick, R.: Lvis: A dataset for large vocabulary instance segmentation. In: CVPR, pp. 5356–5364 (2019) Chen et al. [2019] Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019) Wang et al. [2020] Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020) Lin et al. [2014] Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: ECCV, pp. 740–755 (2014). Springer Gupta et al. [2019] Gupta, A., Dollar, P., Girshick, R.: Lvis: A dataset for large vocabulary instance segmentation. In: CVPR, pp. 5356–5364 (2019) Chen et al. [2019] Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019) Wang et al. [2020] Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: ECCV, pp. 740–755 (2014). Springer Gupta et al. [2019] Gupta, A., Dollar, P., Girshick, R.: Lvis: A dataset for large vocabulary instance segmentation. In: CVPR, pp. 5356–5364 (2019) Chen et al. [2019] Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019) Wang et al. [2020] Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Gupta, A., Dollar, P., Girshick, R.: Lvis: A dataset for large vocabulary instance segmentation. In: CVPR, pp. 5356–5364 (2019) Chen et al. [2019] Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019) Wang et al. [2020] Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019) Wang et al. [2020] Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021)
- Ahn, J., Cho, S., Kwak, S.: Weakly supervised learning of instance segmentation with inter-pixel relations. In: CVPR, pp. 2209–2218 (2019) He et al. [2017] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Tian et al. [2020] Tian, Z., Shen, C., Chen, H., He, T.: Fcos: A simple and strong anchor-free object detector. PAMI 44(4), 1922–1933 (2020) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020) Lin et al. [2014] Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: ECCV, pp. 740–755 (2014). Springer Gupta et al. [2019] Gupta, A., Dollar, P., Girshick, R.: Lvis: A dataset for large vocabulary instance segmentation. In: CVPR, pp. 5356–5364 (2019) Chen et al. [2019] Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019) Wang et al. [2020] Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Tian et al. [2020] Tian, Z., Shen, C., Chen, H., He, T.: Fcos: A simple and strong anchor-free object detector. PAMI 44(4), 1922–1933 (2020) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020) Lin et al. [2014] Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: ECCV, pp. 740–755 (2014). Springer Gupta et al. [2019] Gupta, A., Dollar, P., Girshick, R.: Lvis: A dataset for large vocabulary instance segmentation. In: CVPR, pp. 5356–5364 (2019) Chen et al. [2019] Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019) Wang et al. [2020] Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tian, Z., Shen, C., Chen, H., He, T.: Fcos: A simple and strong anchor-free object detector. PAMI 44(4), 1922–1933 (2020) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020) Lin et al. [2014] Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: ECCV, pp. 740–755 (2014). Springer Gupta et al. [2019] Gupta, A., Dollar, P., Girshick, R.: Lvis: A dataset for large vocabulary instance segmentation. In: CVPR, pp. 5356–5364 (2019) Chen et al. [2019] Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019) Wang et al. [2020] Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020) Lin et al. [2014] Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: ECCV, pp. 740–755 (2014). Springer Gupta et al. [2019] Gupta, A., Dollar, P., Girshick, R.: Lvis: A dataset for large vocabulary instance segmentation. In: CVPR, pp. 5356–5364 (2019) Chen et al. [2019] Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019) Wang et al. [2020] Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: ECCV, pp. 740–755 (2014). Springer Gupta et al. [2019] Gupta, A., Dollar, P., Girshick, R.: Lvis: A dataset for large vocabulary instance segmentation. In: CVPR, pp. 5356–5364 (2019) Chen et al. [2019] Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019) Wang et al. [2020] Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Gupta, A., Dollar, P., Girshick, R.: Lvis: A dataset for large vocabulary instance segmentation. In: CVPR, pp. 5356–5364 (2019) Chen et al. [2019] Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019) Wang et al. [2020] Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019) Wang et al. [2020] Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021)
- He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV, pp. 2961–2969 (2017) Tian et al. [2020] Tian, Z., Shen, C., Chen, H., He, T.: Fcos: A simple and strong anchor-free object detector. PAMI 44(4), 1922–1933 (2020) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020) Lin et al. [2014] Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: ECCV, pp. 740–755 (2014). Springer Gupta et al. [2019] Gupta, A., Dollar, P., Girshick, R.: Lvis: A dataset for large vocabulary instance segmentation. In: CVPR, pp. 5356–5364 (2019) Chen et al. [2019] Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019) Wang et al. [2020] Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tian, Z., Shen, C., Chen, H., He, T.: Fcos: A simple and strong anchor-free object detector. PAMI 44(4), 1922–1933 (2020) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020) Lin et al. [2014] Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: ECCV, pp. 740–755 (2014). Springer Gupta et al. [2019] Gupta, A., Dollar, P., Girshick, R.: Lvis: A dataset for large vocabulary instance segmentation. In: CVPR, pp. 5356–5364 (2019) Chen et al. [2019] Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019) Wang et al. [2020] Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020) Lin et al. [2014] Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: ECCV, pp. 740–755 (2014). Springer Gupta et al. [2019] Gupta, A., Dollar, P., Girshick, R.: Lvis: A dataset for large vocabulary instance segmentation. In: CVPR, pp. 5356–5364 (2019) Chen et al. [2019] Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019) Wang et al. [2020] Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: ECCV, pp. 740–755 (2014). Springer Gupta et al. [2019] Gupta, A., Dollar, P., Girshick, R.: Lvis: A dataset for large vocabulary instance segmentation. In: CVPR, pp. 5356–5364 (2019) Chen et al. [2019] Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019) Wang et al. [2020] Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Gupta, A., Dollar, P., Girshick, R.: Lvis: A dataset for large vocabulary instance segmentation. In: CVPR, pp. 5356–5364 (2019) Chen et al. [2019] Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019) Wang et al. [2020] Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019) Wang et al. [2020] Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021)
- Tian, Z., Shen, C., Chen, H., He, T.: Fcos: A simple and strong anchor-free object detector. PAMI 44(4), 1922–1933 (2020) Dosovitskiy et al. [2020] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020) Lin et al. [2014] Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: ECCV, pp. 740–755 (2014). Springer Gupta et al. [2019] Gupta, A., Dollar, P., Girshick, R.: Lvis: A dataset for large vocabulary instance segmentation. In: CVPR, pp. 5356–5364 (2019) Chen et al. [2019] Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019) Wang et al. [2020] Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020) Lin et al. [2014] Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: ECCV, pp. 740–755 (2014). Springer Gupta et al. [2019] Gupta, A., Dollar, P., Girshick, R.: Lvis: A dataset for large vocabulary instance segmentation. In: CVPR, pp. 5356–5364 (2019) Chen et al. [2019] Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019) Wang et al. [2020] Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: ECCV, pp. 740–755 (2014). Springer Gupta et al. [2019] Gupta, A., Dollar, P., Girshick, R.: Lvis: A dataset for large vocabulary instance segmentation. In: CVPR, pp. 5356–5364 (2019) Chen et al. [2019] Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019) Wang et al. [2020] Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Gupta, A., Dollar, P., Girshick, R.: Lvis: A dataset for large vocabulary instance segmentation. In: CVPR, pp. 5356–5364 (2019) Chen et al. [2019] Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019) Wang et al. [2020] Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019) Wang et al. [2020] Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021)
- Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020) Lin et al. [2014] Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: ECCV, pp. 740–755 (2014). Springer Gupta et al. [2019] Gupta, A., Dollar, P., Girshick, R.: Lvis: A dataset for large vocabulary instance segmentation. In: CVPR, pp. 5356–5364 (2019) Chen et al. [2019] Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019) Wang et al. [2020] Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: ECCV, pp. 740–755 (2014). Springer Gupta et al. [2019] Gupta, A., Dollar, P., Girshick, R.: Lvis: A dataset for large vocabulary instance segmentation. In: CVPR, pp. 5356–5364 (2019) Chen et al. [2019] Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019) Wang et al. [2020] Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Gupta, A., Dollar, P., Girshick, R.: Lvis: A dataset for large vocabulary instance segmentation. In: CVPR, pp. 5356–5364 (2019) Chen et al. [2019] Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019) Wang et al. [2020] Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019) Wang et al. [2020] Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021)
- Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: ECCV, pp. 740–755 (2014). Springer Gupta et al. [2019] Gupta, A., Dollar, P., Girshick, R.: Lvis: A dataset for large vocabulary instance segmentation. In: CVPR, pp. 5356–5364 (2019) Chen et al. [2019] Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019) Wang et al. [2020] Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Gupta, A., Dollar, P., Girshick, R.: Lvis: A dataset for large vocabulary instance segmentation. In: CVPR, pp. 5356–5364 (2019) Chen et al. [2019] Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019) Wang et al. [2020] Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019) Wang et al. [2020] Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021)
- Gupta, A., Dollar, P., Girshick, R.: Lvis: A dataset for large vocabulary instance segmentation. In: CVPR, pp. 5356–5364 (2019) Chen et al. [2019] Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019) Wang et al. [2020] Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019) Wang et al. [2020] Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021)
- Chen, K., Pang, J., Wang, J., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Shi, J., Ouyang, W., et al.: Hybrid task cascade for instance segmentation. In: CVPR, pp. 4974–4983 (2019) Wang et al. [2020] Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021)
- Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: Dynamic and fast instance segmentation. NeurIPS 33, 17721–17732 (2020) Tian et al. [2020] Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021)
- Tian, Z., Shen, C., Chen, H.: Conditional convolutions for instance segmentation. In: ECCV, pp. 282–298 (2020). Springer Ke et al. [2022] Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021)
- Ke, L., Danelljan, M., Li, X., Tai, Y.-W., Tang, C.-K., Yu, F.: Mask transfiner for high-quality instance segmentation. In: CVPR, pp. 4412–4421 (2022) Ranftl et al. [2021] Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021)
- Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV, pp. 12179–12188 (2021) Arun et al. [2020] Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021)
- Arun, A., Jawahar, C., Kumar, M.P.: Weakly supervised instance segmentation by learning annotation consistent instances. In: ECCV, pp. 254–270 (2020). Springer Zhu et al. [2019] Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021)
- Zhu, Y., Zhou, Y., Xu, H., Ye, Q., Doermann, D., Jiao, J.: Learning instance activation maps for weakly supervised instance segmentation. In: CVPR, pp. 3116–3125 (2019) Liu et al. [2020] Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021)
- Liu, Y., Wu, Y.-H., Wen, P., Shi, Y., Qiu, Y., Cheng, M.-M.: Leveraging instance-, image-and dataset-level information for weakly supervised instance segmentation. PAMI 44(3), 1415–1428 (2020) Ge et al. [2019] Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021)
- Ge, W., Guo, S., Huang, W., Scott, M.R.: Label-penet: Sequential label propagation and enhancement networks for weakly supervised instance segmentation. In: ICCV, pp. 3345–3354 (2019) Laradji et al. [2020] Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021)
- Laradji, I.H., Rostamzadeh, N., Pinheiro, P.O., Vazquez, D., Schmidt, M.: Proposal-based instance segmentation with point supervision. In: ICIP, pp. 2126–2130 (2020). IEEE Tang et al. [2022] Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021)
- Tang, C., Xie, L., Zhang, G., Zhang, X., Tian, Q., Hu, X.: Active pointly-supervised instance segmentation. In: ECCV, pp. 606–623 (2022). Springer Khoreva et al. [2017] Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021)
- Khoreva, A., Benenson, R., Hosang, J., Hein, M., Schiele, B.: Simple does it: Weakly supervised instance and semantic segmentation. In: CVPR, pp. 876–885 (2017) Rother et al. [2004] Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021)
- Rother, C., Kolmogorov, V., Blake, A.: “grabcut” interactive foreground extraction using iterated graph cuts. TOG 23(3), 309–314 (2004) Hsu et al. [2019] Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Hsu, C.-C., Hsu, K.-J., Tsai, C.-C., Lin, Y.-Y., Chuang, Y.-Y.: Weakly supervised instance segmentation using the bounding box tightness prior. NeurIPS 32 (2019) Tian et al. [2021] Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021)
- Tian, Z., Shen, C., Wang, X., Chen, H.: Boxinst: High-performance instance segmentation with box annotations. In: CVPR, pp. 5443–5452 (2021) Lan et al. [2021] Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021)
- Lan, S., Yu, Z., Choy, C., Radhakrishnan, S., Liu, G., Zhu, Y., Davis, L.S., Anandkumar, A.: Discobox: Weakly supervised instance segmentation and semantic correspondence from box supervision. In: ICCV, pp. 3406–3416 (2021) Li et al. [2022] Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021)
- Li, W., Liu, W., Zhu, J., Cui, M., Hua, X.-S., Zhang, L.: Box-supervised instance segmentation with level set evolution. In: ECCV, pp. 1–18 (2022). Springer Cheng et al. [2022] Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021)
- Cheng, T., Wang, X., Chen, S., Zhang, Q., Liu, W.: Boxteacher: Exploring high-quality pseudo labels for weakly supervised instance segmentation. arXiv preprint arXiv:2210.05174 (2022) Hu et al. [2018] Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021)
- Hu, R., Dollár, P., He, K., Darrell, T., Girshick, R.: Learning to segment every thing. In: CVPR, pp. 4233–4241 (2018) Zhou et al. [2020] Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021)
- Zhou, Y., Wang, X., Jiao, J., Darrell, T., Yu, F.: Learning saliency propagation for semi-supervised instance segmentation. In: CVPR, pp. 10307–10316 (2020) Wang et al. [2022] Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021)
- Wang, Z., Li, Y., Wang, S.: Noisy boundaries: Lemon or lemonade for semi-supervised instance segmentation? In: CVPR, pp. 16826–16835 (2022) Lee et al. [2021] Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021)
- Lee, J., Yi, J., Shin, C., Yoon, S.: Bbam: Bounding box attribution map for weakly supervised semantic and instance segmentation. In: CVPR, pp. 2643–2652 (2021) Xie et al. [2020] Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021)
- Xie, C., Xiang, Y., Mousavian, A., Fox, D.: The best of both modes: Separately leveraging rgb and depth for unseen object instance segmentation. In: Conference on Robot Learning, pp. 1369–1378 (2020). PMLR Xiang et al. [2021] Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021)
- Xiang, Y., Xie, C., Mousavian, A., Fox, D.: Learning rgb-d feature embeddings for unseen object instance segmentation. In: Conference on Robot Learning, pp. 461–470 (2021). PMLR Gao et al. [2022] Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021)
- Gao, N., He, F., Jia, J., Shan, Y., Zhang, H., Zhao, X., Huang, K.: Panopticdepth: A unified framework for depth-aware panoptic segmentation. In: CVPR, pp. 1632–1642 (2022) Yuan et al. [2022] Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021)
- Yuan, H., Li, X., Yang, Y., Cheng, G., Zhang, J., Tong, Y., Zhang, L., Tao, D.: Polyphonicformer: unified query learning for depth-aware video panoptic segmentation. In: ECCV, pp. 582–599 (2022). Springer Liu et al. [2021] Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021)
- Liu, Y.-C., Ma, C.-Y., He, Z., Kuo, C.-W., Chen, K., Zhang, P., Wu, B., Kira, Z., Vajda, P.: Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480 (2021) Sohn et al. [2020] Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021)
- Sohn, K., Zhang, Z., Li, C.-L., Zhang, H., Lee, C.-Y., Pfister, T.: A simple semi-supervised learning framework for object detection. arXiv preprint arXiv:2005.04757 (2020) Xu et al. [2021] Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021)
- Xu, M., Zhang, Z., Hu, H., Wang, J., Wang, L., Wei, F., Bai, X., Liu, Z.: End-to-end semi-supervised object detection with soft teacher. In: ICCV, pp. 3060–3069 (2021) Chen et al. [2021] Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021)
- Chen, X., Yuan, Y., Zeng, G., Wang, J.: Semi-supervised semantic segmentation with cross pseudo supervision. In: CVPR, pp. 2613–2622 (2021) Wang et al. [2022] Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021)
- Wang, Y., Wang, H., Shen, Y., Fei, J., Li, W., Jin, G., Wu, L., Zhao, R., Le, X.: Semi-supervised semantic segmentation using unreliable pseudo-labels. In: CVPR, pp. 4248–4257 (2022) He et al. [2016] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021)
- He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016) Cordts et al. [2016] Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021)
- Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CVPR, pp. 3213–3223 (2016) Liu et al. [2021] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021)
- Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021) Kuhn [1955] Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021)
- Kuhn, H.W.: The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2), 83–97 (1955) Maninis et al. [2019] Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021)
- Maninis, K.-K., Radosavovic, I., Kokkinos, I.: Attentive single-tasking of multiple tasks. In: CVPR, pp. 1851–1860 (2019) Kendall et al. [2018] Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021)
- Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491 (2018) Saha et al. [2021] Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021)
- Saha, S., Obukhov, A., Paudel, D.P., Kanakis, M., Chen, Y., Georgoulis, S., Van Gool, L.: Learning to relate depth and semantics for unsupervised domain adaptation. In: CVPR, pp. 8197–8207 (2021) Wang et al. [2022] Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021)
- Wang, Y., Tsai, Y.-H., Hung, W.-C., Ding, W., Liu, S., Yang, M.-H.: Semi-supervised multi-task learning for semantics and depth. In: WACV, pp. 2505–2514 (2022) Wang et al. [2020] Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021)
- Wang, L., Zhang, J., Wang, O., Lin, Z., Lu, H.: Sdc-depth: Semantic divide-and-conquer network for monocular depth estimation. In: CVPR, pp. 541–550 (2020) Tarvainen and Valpola [2017] Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021)
- Tarvainen, A., Valpola, H.: Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS 30 (2017) Wang et al. [2021] Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021) Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021)
- Wang, X., Feng, J., Hu, B., Ding, Q., Ran, L., Chen, X., Liu, W.: Weakly-supervised instance segmentation via class-agnostic learning with salient images. In: CVPR, pp. 10225–10235 (2021)