Improve Cross-domain Mixed Sampling with Guidance Training for Adaptive Segmentation (2403.14995v1)
Abstract: Unsupervised Domain Adaptation (UDA) endeavors to adjust models trained on a source domain to perform well on a target domain without requiring additional annotations. In the context of domain adaptive semantic segmentation, which tackles UDA for dense prediction, the goal is to circumvent the need for costly pixel-level annotations. Typically, various prevailing methods baseline rely on constructing intermediate domains via cross-domain mixed sampling techniques to mitigate the performance decline caused by domain gaps. However, such approaches generate synthetic data that diverge from real-world distributions, potentially leading the model astray from the true target distribution. To address this challenge, we propose a novel auxiliary task called Guidance Training. This task facilitates the effective utilization of cross-domain mixed sampling techniques while mitigating distribution shifts from the real world. Specifically, Guidance Training guides the model to extract and reconstruct the target-domain feature distribution from mixed data, followed by decoding the reconstructed target-domain features to make pseudo-label predictions. Importantly, integrating Guidance Training incurs minimal training overhead and imposes no additional inference burden. We demonstrate the efficacy of our approach by integrating it with existing methods, consistently improving performance. The implementation will be available at https://github.com/Wenlve-Zhou/Guidance-Training.
- B. Cheng, W. Wu, D. Tao, S. Mei, T. Mao, and J. Cheng, “Random cropping ensemble neural network for image classification in a robotic arm grasping system,” IEEE Trans. Instrum. Meas., vol. 69, no. 9, pp. 6795–6806, 2020.
- J. Luo, Z. Yang, S. Li, and Y. Wu, “Fpcb surface defect detection: A decoupled two-stage object detection framework,” IEEE Trans. Instrum. Meas., vol. 70, pp. 1–11, 2021.
- W. Xie, P. X. Liu, and M. Zheng, “Moving object segmentation and detection for robust rgbd-slam in dynamic environments,” IEEE Trans. Instrum. Meas., vol. 70, pp. 1–8, 2020.
- G. Wilson and D. J. Cook, “A survey of unsupervised deep domain adaptation,” ACM Trans. Intell. Syst. Technol., vol. 11, no. 5, pp. 1–46, 2020.
- M. Toldo, A. Maracani, U. Michieli, and P. Zanuttigh, “Unsupervised domain adaptation in semantic segmentation: a review,” Technologies, vol. 8, no. 2, p. 35, 2020.
- S. Yun, D. Han, S. J. Oh, S. Chun, J. Choe, and Y. Yoo, “Cutmix: Regularization strategy to train strong classifiers with localizable features,” in IEEE Int. Conf. Comput. Vis. (ICCV), 2019, pp. 6023–6032.
- H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz, “mixup: Beyond empirical risk minimization,” in Int. Conf. Learn. Represent. (ICLR), 2017.
- D. Berthelot, N. Carlini, I. Goodfellow, N. Papernot, A. Oliver, and C. A. Raffel, “Mixmatch: A holistic approach to semi-supervised learning,” in Adv. Neural Inf. Process. Syst. (NIPS), 2019.
- W. Tranheden, V. Olsson, J. Pinto, and L. Svensson, “Dacs: Domain adaptation via cross-domain mixed sampling,” in WACV, 2021, pp. 1379–1389.
- L. Hoyer, D. Dai, and L. Van Gool, “Daformer: Improving network architectures and training strategies for domain-adaptive semantic segmentation,” in IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2022, pp. 9924–9935.
- L. Hoyer, D. Dai and L. Van Gool, “Hrda: Context-aware high-resolution domain-adaptive semantic segmentation,” in Eur. Conf. Comput. Vis. (ECCV). Springer, 2022, pp. 372–391.
- X. Huo, L. Xie, H. Hu, W. Zhou, H. Li, and Q. Tian, “Domain-agnostic prior for transfer semantic segmentation,” in IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2022, pp. 7075–7085.
- S. Saha, L. Hoyer, A. Obukhov, D. Dai, and L. Van Gool, “Edaps: Enhanced domain-adaptive panoptic segmentation,” in IEEE Int. Conf. Comput. Vis. (ICCV), 2023.
- R. Xia, C. Zhao, M. Zheng, Z. Wu, Q. Sun, and Y. Tang, “Cmda: Cross-modality domain adaptation for nighttime semantic segmentation,” in IEEE Int. Conf. Comput. Vis. (ICCV), 2023, pp. 21 572–21 581.
- J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2015, pp. 3431–3440.
- O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in MICCAI, 2015, pp. 234–241.
- L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Semantic image segmentation with deep convolutional nets and fully connected crfs,” arXiv preprint arXiv:1412.7062, 2014.
- L. Chen, G. Papandreou, I. Kokkinos, K. Murphy and A. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 4, pp. 834–848, 2017.
- L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with atrous separable convolution for semantic image segmentation,” in Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 801–818.
- Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Adv. Neural Inf. Process. Syst. (NIPS), 2017.
- X. Li, H. Ding, W. Zhang, H. Yuan, J. Pang, G. Cheng, K. Chen, Z. Liu, and C. C. Loy, “Transformer-based visual segmentation: A survey,” arXiv preprint arXiv:2304.09854, 2023.
- E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, “Segformer: Simple and efficient design for semantic segmentation with transformers,” vol. 34, 2021, pp. 12 077–12 090.
- Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie, “A convnet for the 2020s,” in IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2022, pp. 11 976–11 986.
- M.-H. Guo, C.-Z. Lu, Q. Hou, Z. Liu, M.-M. Cheng, and S.-M. Hu, “Segnext: Rethinking convolutional attention design for semantic segmentation,” Adv. Neural Inf. Process. Syst. (NIPS), vol. 35, pp. 1140–1156, 2022.
- I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Adv. Neural Inf. Process. Syst. (NIPS), 2014.
- H. Rangwani, S. K. Aithal, M. Mishra, A. Jain, and V. B. Radhakrishnan, “A closer look at smoothness in domain adversarial training,” in Proc. Int. Conf. Mach. Learn. (ICML), 2022, pp. 18 378–18 399.
- H. Liu, J. Wang, and M. Long, “Cycle self-training for domain adaptation,” in Proc. Adv. Neural Inf. Process. Syst. (NIPS), 2021, pp. 22 968–22 981.
- Z. Deng, Y. Luo, and J. Zhu, “Cluster alignment with a teacher for unsupervised domain adaptation,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), 2019, pp. 9944–9953.
- M. Chen, Z. Zheng, Y. Yang, and T.-S. Chua, “Pipa: Pixel-and patch-wise self-supervised learning for domain adaptative semantic segmentation,” in ACM MM, 2023, pp. 1905–1914.
- H. Guo, Y. Mao, and R. Zhang, “Augmenting data with mixup for sentence classification: An empirical study,” arXiv preprint arXiv:1905.08941, 2019.
- L. Sun, C. Xia, W. Yin, T. Liang, P. S. Yu, and L. He, “Mixup-transformer: dynamic data augmentation for nlp tasks,” arXiv preprint arXiv:2010.02394, 2020.
- V. Verma, A. Lamb, C. Beckham, A. Najafi, I. Mitliagkas, D. Lopez-Paz, and Y. Bengio, “Manifold mixup: Better representations by interpolating hidden states,” in Proc. Int. Conf. Mach. Learn. (ICML). PMLR, 2019, pp. 6438–6447.
- V. Olsson, W. Tranheden, J. Pinto, and L. Svensson, “Classmix: Segmentation-based data augmentation for semi-supervised learning,” in WACV, 2021, pp. 1369–1378.
- J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” 2018.
- Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov, “Roberta: A robustly optimized bert pretraining approach,” arXiv preprint arXiv:1907.11692, 2019.
- K. He, X. Chen, S. Xie, Y. Li, P. Dollár, and R. Girshick, “Masked autoencoders are scalable vision learners,” in IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2022, pp. 16 000–16 009.
- C. Wei, H. Fan, S. Xie, C.-Y. Wu, A. Yuille, and C. Feichtenhofer, “Masked feature prediction for self-supervised visual pre-training,” in IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2022, pp. 14 668–14 678.
- S. Woo, S. Debnath, R. Hu, X. Chen, Z. Liu, I. S. Kweon, and S. Xie, “Convnext v2: Co-designing and scaling convnets with masked autoencoders,” in IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2023, pp. 16 133–16 142.
- H. Bao, L. Dong, S. Piao, and F. Wei, “Beit: Bert pre-training of image transformers,” in Int. Conf. Learn. Represent. (ICLR), 2021.
- A. Van Den Oord, O. Vinyals et al., “Neural discrete representation learning,” in Adv. Neural Inf. Process. Syst. (NIPS), vol. 30, 2017.
- M. Assran, Q. Duval, I. Misra, P. Bojanowski, P. Vincent, M. Rabbat, Y. LeCun, and N. Ballas, “Self-supervised learning from images with a joint-embedding predictive architecture,” in IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2023, pp. 15 619–15 629.
- A. Tarvainen and H. Valpola, “Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results,” in Adv. Neural Inf. Process. Syst. (NIPS), 2017.
- M. Ghifary, W. B. Kleijn, M. Zhang, D. Balduzzi, and W. Li, “Deep reconstruction-classification networks for unsupervised domain adaptation,” in Eur. Conf. Comput. Vis. (ECCV). Springer, 2016, pp. 597–613.
- M. Long, Y. Cao, J. Wang, and M. Jordan, “Learning transferable features with deep adaptation networks,” in Proc. Int. Conf. Mach. Learn. (ICML), 2015, pp. 97–105.
- Y. Ganin and V. Lempitsky, “Unsupervised domain adaptation by backpropagation,” in Proc. Int. Conf. Mach. Learn. (ICML), 2015, pp. 1180–1189.
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” in Proc. Int. Conf. Learn. Represent. (ICLR), 2021.
- L. Zhang, A. Rao, and M. Agrawala, “Adding conditional control to text-to-image diffusion models,” in IEEE Int. Conf. Comput. Vis. (ICCV), 2023, pp. 3836–3847.
- S. R. Richter, V. Vineet, S. Roth, and V. Koltun, “Playing for data: Ground truth from computer games,” in Eur. Conf. Comput. Vis. (ECCV). Springer, 2016, pp. 102–118.
- G. Ros, L. Sellart, J. Materzynska, D. Vazquez, and A. M. Lopez, “The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes,” in IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016, pp. 3234–3243.
- M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” in IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016, pp. 3213–3223.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016, pp. 770–778.
- I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” in Proc. Int. Conf. Learn. Represent. (ICLR), 2018.
- L. Hoyer, D. Dai, H. Wang, and L. Van Gool, “Mic: Masked image consistency for context-enhanced domain adaptation,” in IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2023, pp. 11 721–11 732.
- K. Mei, C. Zhu, J. Zou, and S. Zhang, “Instance adaptive self-training for unsupervised domain adaptation,” in Eur. Conf. Comput. Vis. (ECCV). Springer, 2020, pp. 415–430.
- Y. Wang, J. Peng, and Z. Zhang, “Uncertainty-aware pseudo label refinery for domain adaptive semantic segmentation,” in IEEE Int. Conf. Comput. Vis. (ICCV), 2021, pp. 9092–9101.
- Y. Cheng, F. Wei, J. Bao, D. Chen, F. Wen, and W. Zhang, “Dual path learning for domain adaptation of semantic segmentation,” in IEEE Int. Conf. Comput. Vis. (ICCV), 2021, pp. 9082–9091.
- N. Araslanov and S. Roth, “Self-supervised augmentation consistency for adapting semantic segmentation,” in IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2021, pp. 15 384–15 394.
- H. Ma, X. Lin, Z. Wu, and Y. Yu, “Coarse-to-fine domain adaptive semantic segmentation with photometric alignment and category-center regularization,” in IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2021, pp. 4051–4060.
- Q. Wang, D. Dai, L. Hoyer, L. Van Gool, and O. Fink, “Domain adaptive semantic segmentation with self-supervised depth estimation,” in IEEE Int. Conf. Comput. Vis. (ICCV), 2021, pp. 8515–8525.
- P. Zhang, B. Zhang, T. Zhang, D. Chen, Y. Wang, and F. Wen, “Prototypical pseudo label denoising and target structure learning for domain adaptive semantic segmentation,” in IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2021, pp. 12 414–12 424.
- B. Xie, S. Li, M. Li, C. H. Liu, G. Huang, and G. Wang, “Sepico: Semantic-guided pixel contrast for domain adaptive semantic segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., 2023.
- J. Lu, J. Shi, H. Zhu, J. Ni, X. Shu, Y. Sun, and Z. Cheng, “Depth guidance and intradomain adaptation for semantic segmentation,” IEEE Trans. Instrum. Meas., vol. 72, pp. 1–13, 2023.
- Wenlve Zhou (4 papers)
- Zhiheng Zhou (11 papers)
- Tianlei Wang (6 papers)
- Delu Zeng (21 papers)