Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
173 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

RL-I2IT: Image-to-Image Translation with Deep Reinforcement Learning (2309.13672v7)

Published 24 Sep 2023 in cs.CV and cs.AI

Abstract: Most existing Image-to-Image Translation (I2IT) methods generate images in a single run of a deep learning (DL) model. However, designing such a single-step model is always challenging, requiring a huge number of parameters and easily falling into bad global minimums and overfitting. In this work, we reformulate I2IT as a step-wise decision-making problem via deep reinforcement learning (DRL) and propose a novel framework that performs RL-based I2IT (RL-I2IT). The key feature in the RL-I2IT framework is to decompose a monolithic learning process into small steps with a lightweight model to progressively transform a source image successively to a target image. Considering that it is challenging to handle high dimensional continuous state and action spaces in the conventional RL framework, we introduce meta policy with a new concept Plan to the standard Actor-Critic model, which is of a lower dimension than the original image and can facilitate the actor to generate a tractable high dimensional action. In the RL-I2IT framework, we also employ a task-specific auxiliary learning strategy to stabilize the training process and improve the performance of the corresponding task. Experiments on several I2IT tasks demonstrate the effectiveness and robustness of the proposed method when facing high-dimensional continuous action space problems. Our implementation of the RL-I2IT framework is available at https://github.com/Algolzw/SPAC-Deformable-Registration.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (89)
  1. Y. Pang, J. Lin, T. Qin, and Z. Chen, “Image-to-image translation: Methods and applications,” arXiv preprint arXiv:2101.08629, 2021.
  2. D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” ICLR, 2014.
  3. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention.   Springer, 2015, pp. 234–241.
  4. P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1125–1134.
  5. C. Saharia, W. Chan, H. Chang, C. Lee, J. Ho, T. Salimans, D. Fleet, and M. Norouzi, “Palette: Image-to-image diffusion models,” in ACM SIGGRAPH 2022 Conference Proceedings, 2022, pp. 1–10.
  6. S. Sun, L. Wei, J. Xing, J. Jia, and Q. Tian, “Sddm: score-decomposed diffusion models on manifolds for unpaired image-to-image translation,” in International Conference on Machine Learning.   PMLR, 2023, pp. 33 115–33 134.
  7. N. Tumanyan, M. Geyer, S. Bagon, and T. Dekel, “Plug-and-play diffusion features for text-driven image-to-image translation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 1921–1930.
  8. B. Neyshabur, S. Bhojanapalli, D. McAllester, and N. Srebro, “Exploring generalization in deep learning,” arXiv preprint arXiv:1706.08947, 2017.
  9. J. C. Caicedo and S. Lazebnik, “Active object localization with deep reinforcement learning,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 2488–2496.
  10. J. Hu, Z. Luo, X. Wang, S. Sun, Y. Yin, K. Cao, Q. Song, S. Lyu, and X. Wu, “End-to-end multimodal image registration via reinforcement learning,” Medical Image Analysis, vol. 68, p. 101878, 2021.
  11. Z. Luo, X. Wang, X. Wu, Y. Yin, K. Cao, Q. Song, and J. Hu, “A spatiotemporal agent for robust multimodal registration,” IEEE Access, vol. 8, pp. 75 347–75 358, 2020.
  12. T. Haarnoja, A. Zhou, K. Hartikainen, G. Tucker, S. Ha, J. Tan, V. Kumar, H. Zhu, A. Gupta, P. Abbeel et al., “Soft actor-critic algorithms and applications,” arXiv preprint arXiv:1812.05905, 2018.
  13. Y. Xiang, X. Wang, S. Hu, B. Zhu, X. Huang, X. Wu, and S. Lyu, “Rmbench: Benchmarking deep reinforcement learning for robotic manipulator control,” IEEE/RSJ International Conference on Intelligent Robots (IROS), 2023.
  14. D. Yarats, A. Zhang, I. Kostrikov, B. Amos, J. Pineau, and R. Fergus, “Improving sample efficiency in model-free reinforcement learning from images,” arXiv preprint arXiv:1910.01741, 2019.
  15. A. Nair, V. Pong, M. Dalal, S. Bahl, S. Lin, and S. Levine, “Visual reinforcement learning with imagined goals,” arXiv preprint arXiv:1807.04742, 2018.
  16. K. Black, M. Janner, Y. Du, I. Kostrikov, and S. Levine, “Training diffusion models with reinforcement learning,” arXiv preprint arXiv:2305.13301, 2023.
  17. T.-C. Wang, M.-Y. Liu, J.-Y. Zhu, A. Tao, J. Kautz, and B. Catanzaro, “High-resolution image synthesis and semantic manipulation with conditional gans,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8798–8807.
  18. Z. Luo, J. Hu, X. Wang, S. Lyu, B. Kong, Y. Yin, Q. Song, and X. Wu, “Stochastic actor-executor-critic for image-to-image translation,” in Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, 2021, pp. 2775–2781.
  19. C. Feng, J. Hu, X. Wang, S. Hu, B. Zhu, X. Wu, H. Zhu, and S. Lyu, “Controlling neural style transfer with deep reinforcement learning,” International Joint Conference on Artificial Intelligence (IJCAI), 2023.
  20. Z. Luo, J. Hu, X. Wang, S. Hu, B. Kong, Y. Yin, Q. Song, X. Wu, and S. Lyu, “Stochastic planner-actor-critic for unsupervised deformable image registration,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 2, 2022, pp. 1917–1925.
  21. D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, and A. A. Efros, “Context encoders: Feature learning by inpainting,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2536–2544.
  22. B. D. de Vos, F. F. Berendsen, M. A. Viergever, H. Sokooti, M. Staring, and I. Išgum, “A deep learning framework for unsupervised affine and deformable image registration,” Medical image analysis, vol. 52, pp. 128–143, 2019.
  23. L. A. Gatys, A. S. Ecker, and M. Bethge, “A neural algorithm of artistic style,” arXiv preprint arXiv:1508.06576, 2015.
  24. J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu, and T. S. Huang, “Generative image inpainting with contextual attention,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 5505–5514.
  25. B. Li, K. Xue, B. Liu, and Y.-K. Lai, “Bbdm: Image-to-image translation with brownian bridge diffusion models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 1952–1961.
  26. A. X. Lee, A. Nagabandi, P. Abbeel, and S. Levine, “Stochastic latent actor-critic: Deep reinforcement learning with a latent variable model,” in Neural Information Processing Systems (NeurIPS), 2020.
  27. T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” arXiv preprint arXiv:1509.02971, 2015.
  28. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE transactions on image processing, vol. 13, no. 4, pp. 600–612, 2004.
  29. L. R. Dice, “Measures of the amount of ecologic association between species,” Ecology, vol. 26, no. 3, pp. 297–302, 1945.
  30. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
  31. G. Haskins, U. Kruger, and P. Yan, “Deep learning in medical image registration: a survey,” Machine Vision and Applications, vol. 31, no. 1, pp. 1–18, 2020.
  32. G. Balakrishnan, A. Zhao, M. R. Sabuncu, J. Guttag, and A. V. Dalca, “An unsupervised learning model for deformable medical image registration,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 9252–9260.
  33. M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu, “Spatial transformer networks,” arXiv preprint arXiv:1506.02025, 2015.
  34. T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida, “Spectral normalization for generative adversarial networks,” arXiv preprint arXiv:1802.05957, 2018.
  35. Y. Zeng, J. Fu, H. Chao, and B. Guo, “Learning pyramid-context encoder network for high-quality image inpainting,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 1486–1494.
  36. C. Zheng, T.-J. Cham, and J. Cai, “Pluralistic image completion,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 1438–1447.
  37. T. Yu, Z. Guo, X. Jin, S. Wu, Z. Chen, W. Li, Z. Zhang, and S. Liu, “Region normalization for image inpainting.” in AAAI, 2020, pp. 12 733–12 740.
  38. Z. Yan, X. Li, M. Li, W. Zuo, and S. Shan, “Shift-net: Image inpainting via deep feature rearrangement,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 1–17.
  39. G. Daras, J. Dean, A. Jalal, and A. G. Dimakis, “Intermediate layer optimization for inverse problems using deep generative models,” arXiv preprint arXiv:2102.07364, 2021.
  40. I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville, “Improved training of wasserstein gans,” Advances in neural information processing systems, 2017.
  41. A. Jolicoeur-Martineau, “The relativistic discriminator: a key element missing from standard gan,” arXiv preprint arXiv:1807.00734, 2018.
  42. R. Tyleček and R. Šára, “Spatial pattern templates for recognition of objects with regular structure,” in German Conference on Pattern Recognition.   Springer, 2013, pp. 364–374.
  43. M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  44. A. Yu and K. Grauman, “Fine-grained visual comparisons with local learning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 192–199.
  45. C. Wang, C. Xu, C. Wang, and D. Tao, “Perceptual adversarial networks for image-to-image transformation,” IEEE Transactions on Image Processing, vol. 27, no. 8, pp. 4066–4079, 2018.
  46. C. Wang, W. Niu, Y. Jiang, H. Zheng, Z. Yu, Z. Gu, and B. Zheng, “Discriminative region proposal adversarial network for high-quality image-to-image translation,” International Journal of Computer Vision, 2019.
  47. F. Gao, X. Xu, J. Yu, M. Shang, X. Li, and D. Tao, “Complementary, heterogeneous and adversarial networks for image-to-image translation,” IEEE Transactions on Image Processing, vol. 30, pp. 3487–3498, 2021.
  48. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
  49. R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 586–595.
  50. J. Cheng, A. Jaiswal, Y. Wu, P. Natarajan, and P. Natarajan, “Style-aware normalized loss for improving arbitrary style transfer,” in CVPR, 2021.
  51. X. Huang and S. Belongie, “Arbitrary style transfer in real-time with adaptive instance normalization,” in CVPR, 2017, pp. 1501–1510.
  52. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in CVPR, 2016, pp. 770–778.
  53. J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” in ECCV.   Springer, 2016.
  54. L. Gatys, A. S. Ecker, and M. Bethge, “Texture synthesis using convolutional neural networks,” NeurIPS, vol. 28, pp. 262–270, 2015.
  55. F. Phillips and B. Mackintosh, “Wiki art gallery, inc.: A case for critical thinking,” Issues in Accounting Education, vol. 26, no. 3, pp. 593–608, 2011.
  56. T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in ECCV.   Springer, 2014, pp. 740–755.
  57. Y. Deng, F. Tang, X. Pan, W. Dong, C. Ma, and C. Xu, “Stytr^ 2: Unbiased image style transfer with transformers,” arXiv preprint arXiv:2105.14576, 2021.
  58. Y. Li, C. Fang, J. Yang, Z. Wang, X. Lu, and M.-H. Yang, “Universal style transfer via feature transforms,” Advances in neural information processing systems, vol. 30, 2017.
  59. D. Y. Park and K. H. Lee, “Arbitrary style transfer with style-attentional networks,” in CVPR, 2019, pp. 5880–5888.
  60. T. Lin, Z. Ma, F. Li, D. He, X. Li, E. Ding, N. Wang, J. Li, and X. Gao, “Drafting and revision: Laplacian pyramid network for fast high-quality artistic style transfer,” in CVPR, 2021, pp. 5141–5150.
  61. J. An, S. Huang, Y. Song, D. Dou, W. Liu, and J. Luo, “Artflow: Unbiased image style transfer via reversible neural flows,” in CVPR, 2021, pp. 862–871.
  62. H. Chen, Z. Wang, H. Zhang, Z. Zuo, A. Li, W. Xing, D. Lu et al., “Artistic style transfer with internal-external learning and contrastive learning,” NeurIPS, vol. 34, 2021.
  63. S. Liu, T. Lin, D. He, F. Li, M. Wang, X. Li, Z. Sun, Q. Li, and E. Ding, “Adaattn: Revisit attention mechanism in arbitrary neural style transfer,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 6649–6658.
  64. X. Li, S. Liu, J. Kautz, and M.-H. Yang, “Learning linear transformations for fast image and video style transfer,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 3809–3817.
  65. Y. Deng, F. Tang, W. Dong, H. Huang, C. Ma, and C. Xu, “Arbitrary video style transfer via multi-channel correlation,” AAAI, 2021.
  66. W. Wang, S. Yang, J. Xu, and J. Liu, “Consistent video style transfer via relaxation and regularization,” IEEE Transactions on Image Processing, vol. 29, pp. 9125–9139, 2020.
  67. P. Mirowski, R. Pascanu, F. Viola, H. Soyer, A. J. Ballard, A. Banino, M. Denil, R. Goroshin, L. Sifre, K. Kavukcuoglu et al., “Learning to navigate in complex environments,” ICLR, 2017.
  68. “Pexels,” https://www.pexels.com/, 2022, accessed: 2022-03-12.
  69. D. J. Butler, J. Wulff, G. B. Stanley, and M. J. Black, “A naturalistic open source movie for optical flow evaluation,” in ECCV.   Springer, 2012, pp. 611–625.
  70. X. Yang, R. Kwitt, and M. Niethammer, “Fast predictive image registration,” in Deep Learning and Data Labeling for Medical Applications.   Springer, 2016, pp. 48–57.
  71. B. B. Avants, C. L. Epstein, M. Grossman, and J. C. Gee, “Symmetric diffeomorphic image registration with cross-correlation: evaluating automated labeling of elderly and neurodegenerative brain,” Medical image analysis, vol. 12, no. 1, pp. 26–41, 2008.
  72. S. Klein, M. Staring, K. Murphy, M. A. Viergever, and J. P. Pluim, “Elastix: a toolbox for intensity-based medical image registration,” IEEE transactions on medical imaging, vol. 29, no. 1, pp. 196–205, 2009.
  73. M. F. Beg, M. I. Miller, A. Trouvé, and L. Younes, “Computing large deformation metric mappings via geodesic flows of diffeomorphisms,” International journal of computer vision, vol. 61, no. 2, pp. 139–157, 2005.
  74. G. Balakrishnan, A. Zhao, M. R. Sabuncu, J. Guttag, and A. V. Dalca, “Voxelmorph: a learning framework for deformable medical image registration,” IEEE transactions on medical imaging, vol. 38, no. 8, pp. 1788–1800, 2019.
  75. A. V. Dalca, G. Balakrishnan, J. Guttag, and M. R. Sabuncu, “Unsupervised learning of probabilistic diffeomorphic registration for images and surfaces,” Medical image analysis, vol. 57, pp. 226–236, 2019.
  76. R. Sandkühler, S. Andermatt, G. Bauman, S. Nyilas, C. Jud, and P. C. Cattin, “Recurrent registration neural networks for deformable image registration,” in Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, Eds., vol. 32.   Curran Associates, Inc., 2019.
  77. H. Xu, J. Zhang, J. Cai, H. Rezatofighi, and D. Tao, “Gmflow: Learning optical flow via global matching,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 8121–8130.
  78. W. Jiang, E. Trulls, J. Hosang, A. Tagliasacchi, and K. M. Yi, “Cotr: Correspondence transformer for matching across images,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 6207–6217.
  79. S. Zhao, Y. Dong, E. I. Chang, Y. Xu et al., “Recursive cascaded networks for unsupervised medical image registration,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 10 600–10 610.
  80. T. C. Mok and A. Chung, “Fast symmetric diffeomorphic image registration with convolutional neural networks,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 4644–4653.
  81. J. MacQueen et al., “Some methods for classification and analysis of multivariate observations,” in Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol. 1.   Oakland, CA, USA, 1967, pp. 281–297.
  82. L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” Physica D: nonlinear phenomena, vol. 60, no. 1-4, pp. 259–268, 1992.
  83. S. G. Mueller, M. W. Weiner, L. J. Thal, R. C. Petersen, C. R. Jack, W. Jagust, J. Q. Trojanowski, A. W. Toga, and L. Beckett, “Ways toward an early diagnosis in alzheimer’s disease: the alzheimer’s disease neuroimaging initiative (adni),” Alzheimer’s & Dementia, vol. 1, no. 1, pp. 55–66, 2005.
  84. A. Di Martino, C.-G. Yan, Q. Li, E. Denio, F. X. Castellanos, K. Alaerts, J. S. Anderson, M. Assaf, S. Y. Bookheimer, M. Dapretto et al., “The autism brain imaging data exchange: towards a large-scale evaluation of the intrinsic brain architecture in autism,” Molecular psychiatry, vol. 19, no. 6, pp. 659–667, 2014.
  85. P. Bellec, C. Chu, F. Chouinard-Decorte, Y. Benhajali, D. S. Margulies, and R. C. Craddock, “The neuro bureau adhd-200 preprocessed repository,” Neuroimage, vol. 144, pp. 275–286, 2017.
  86. D. W. Shattuck, M. Mirza, V. Adisetiyo, C. Hojatkashani, G. Salamon, K. L. Narr, R. A. Poldrack, R. M. Bilder, and A. W. Toga, “Construction of a 3d probabilistic atlas of human cortical structures,” Neuroimage, vol. 39, no. 3, pp. 1064–1080, 2008.
  87. P. Bilic, P. F. Christ, E. Vorontsov, G. Chlebus, H. Chen, Q. Dou, C.-W. Fu, X. Han, P.-A. Heng, J. Hesser et al., “The liver tumor segmentation benchmark (lits),” arXiv preprint arXiv:1901.04056, 2019.
  88. T. Heimann, B. Van Ginneken, M. A. Styner, Y. Arzhaeva, V. Aurich, C. Bauer, A. Beck, C. Becker, R. Beichel, G. Bekes et al., “Comparison and evaluation of methods for liver segmentation from ct datasets,” IEEE transactions on medical imaging, vol. 28, no. 8, pp. 1251–1265, 2009.
  89. D. Rueckert, L. I. Sonoda, C. Hayes, D. L. Hill, M. O. Leach, and D. J. Hawkes, “Nonrigid registration using free-form deformations: application to breast mr images,” IEEE transactions on medical imaging, vol. 18, no. 8, pp. 712–721, 1999.

Summary

We haven't generated a summary for this paper yet.