Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
143 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Deformable Generator Networks: Unsupervised Disentanglement of Appearance and Geometry (1806.06298v4)

Published 16 Jun 2018 in cs.LG, cs.AI, cs.CV, and stat.ML

Abstract: We present a deformable generator model to disentangle the appearance and geometric information for both image and video data in a purely unsupervised manner. The appearance generator network models the information related to appearance, including color, illumination, identity or category, while the geometric generator performs geometric warping, such as rotation and stretching, through generating deformation field which is used to warp the generated appearance to obtain the final image or video sequences. Two generators take independent latent vectors as input to disentangle the appearance and geometric information from image or video sequences. For video data, a nonlinear transition model is introduced to both the appearance and geometric generators to capture the dynamics over time. The proposed scheme is general and can be easily integrated into different generative models. An extensive set of qualitative and quantitative experiments shows that the appearance and geometric information can be well disentangled, and the learned geometric generator can be conveniently transferred to other image datasets to facilitate knowledge transfer tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. Y. Bengio, A. Courville, and P. Vincent, “Representation learning: A review and new perspectives,” IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 8, pp. 1798–1828, 2013.
  2. M. F. Mathieu, J. J. Zhao, J. Zhao, A. Ramesh, P. Sprechmann, and Y. LeCun, “Disentangling factors of variation in deep representation using adversarial training,” in Advances in Neural Information Processing Systems, 2016, pp. 5040–5048.
  3. C. P. Burgess, I. Higgins, A. Pal, L. Matthey, N. Watters, G. Desjardins, and A. Lerchner, “Understanding disentangling in beta-vae,” arXiv preprint arXiv:1804.03599, 2018.
  4. A. Achille and S. Soatto, “Emergence of invariance and disentanglement in deep representations,” in Proc. International Conference on Machine Learning (ICML), Sydney, 2017.
  5. B. M. Lake, T. D. Ullman, J. B. Tenenbaum, and S. J. Gershman, “Building machines that learn and think like people,” Behavioral and Brain Sciences, vol. 40, 2017.
  6. A. A. Alemi, I. Fischer, J. V. Dillon, and K. Murphy, “Deep variational information bottleneck,” arXiv preprint arXiv:1612.00410, 2016.
  7. A. Brock, J. Donahue, and K. Simonyan, “Large scale gan training for high fidelity natural image synthesis,” arXiv:1809.11096, 2018.
  8. T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generative adversarial networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2019, pp. 4401–4410.
  9. M. Lucic, K. Kurach, M. Michalski, S. Gelly, and O. Bousquet, “Are gans created equal? a large-scale study,” arXiv:1711.10337, 2017.
  10. T. Han, E. Nijkamp, X. Fang, M. Hill, S.-C. Zhu, and Y. N. Wu, “Divergence triangle for joint training of generator model, energy-based model, and inferential model,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 8670–8679.
  11. H.-Y. Lee, H.-Y. Tseng, J.-B. Huang, M. Singh, and M.-H. Yang, “Diverse image-to-image translation via disentangled representations,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 35–51.
  12. X. Huang, M.-Y. Liu, S. Belongie, and J. Kautz, “Multimodal unsupervised image-to-image translation,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 172–189.
  13. J. Xie, R. Gao, Z. Zheng, S.-C. Zhu, and Y. N. Wu, “Motion-based generator model: Unsupervised disentanglement of appearance, trackable and intrackable motions in dynamic patterns,” arXiv preprint arXiv:1911.11294, 2019.
  14. T. Han, X. Xing, and Y. N. Wu, “Learning multi-view generator network for shared representation,” in 2018 24th International Conference on Pattern Recognition (ICPR).   IEEE, 2018, pp. 2062–2068.
  15. F. Locatello, G. Abbati, T. Rainforth, S. Bauer, B. Schölkopf, and O. Bachem, “On the fairness of disentangled representations,” in Advances in Neural Information Processing Systems 32.   Curran Associates, Inc., 2019, pp. 14 584–14 597.
  16. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in neural information processing systems, 2014, pp. 2672–2680.
  17. Z. Li, Y. Tang, and Y. He, “Unsupervised disentangled representation learning with analogical relations,” arXiv preprint arXiv:1804.09502, 2018.
  18. L. Tran, X. Yin, and X. Liu, “Disentangled representation learning gan for pose-invariant face recognition,” in CVPR, vol. 3, no. 6, 2017, p. 7.
  19. D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint arXiv:1312.6114, 2013.
  20. D. J. Rezende, S. Mohamed, and D. Wierstra, “Stochastic backpropagation and approximate inference in deep generative models,” in NIPS, 2014, pp. 1278–1286.
  21. A. Kumar, P. Sattigeri, and A. Balakrishnan, “Variational inference of disentangled latent concepts from unlabeled observations,” arXiv preprint arXiv:1711.00848, 2017.
  22. X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel, “Infogan: Interpretable representation learning by information maximizing generative adversarial nets,” in Advances in Neural Information Processing Systems, 2016, pp. 2172–2180.
  23. I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot, M. Botvinick, S. Mohamed, and A. Lerchner, “beta-vae: Learning basic visual concepts with a constrained variational framework,” 2016.
  24. T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila, “Analyzing and improving the image quality of stylegan,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 8110–8119.
  25. T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive growing of gans for improved quality, stability, and variation,” arXiv preprint arXiv:1710.10196, 2017.
  26. T. Nguyen-Phuoc, C. Li, L. Theis, C. Richardt, and Y.-L. Yang, “Hologan: Unsupervised learning of 3d representations from natural images,” in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 7588–7597.
  27. T. F. Cootes, G. J. Edwards, and C. J. Taylor, “Active appearance models,” IEEE Transactions on pattern analysis and machine intelligence, vol. 23, no. 6, pp. 681–685, 2001.
  28. J. Kossaifi, G. Tzimiropoulos, and M. Pantic, “Fast and exact newton and bidirectional fitting of active appearance models,” IEEE transactions on image processing, vol. 26, no. 2, pp. 1040–1053, 2017.
  29. J. Kossaifi, L. Tran, Y. Panagakis, and M. Pantic, “Gagan: Geometry-aware generative adverserial networks,” arXiv preprint arXiv:1712.00684, 2017.
  30. Z. Shu, M. Sahasrabudhe, A. Guler, D. Samaras, N. Paragios, and I. Kokkinos, “Deforming autoencoders: Unsupervised disentangling of shape and appearance,” arXiv preprint arXiv:1806.06503, 2018.
  31. M. Jaderberg, K. Simonyan, A. Zisserman et al., “Spatial transformer networks,” in Advances in neural information processing systems, 2015, pp. 2017–2025.
  32. J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y. Wei, “Deformable convolutional networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 764–773.
  33. D. Sun, S. Roth, and M. J. Black, “A quantitative analysis of current practices in optical flow estimation and the principles behind them,” International Journal of Computer Vision, vol. 106, no. 2, pp. 115–137, 2014.
  34. E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy, and T. Brox, “Flownet 2.0: Evolution of optical flow estimation with deep networks,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, 2017.
  35. T. Han, Y. Lu, S.-C. Zhu, and Y. N. Wu, “Alternating back-propagation for generator network.” in AAAI, 2017, pp. 1976–1984.
  36. L. Younes, “On the convergence of markovian stochastic algorithms with rapidly decreasing ergodicity rates,” Stochastics: An International Journal of Probability and Stochastic Processes, vol. 65, no. 3-4, pp. 177–228, 1999.
  37. G. Doretto, A. Chiuso, Y. N. Wu, and S. Soatto, “Dynamic textures,” International Journal of Computer Vision, vol. 51, no. 2, pp. 91–109, 2003.
  38. Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep learning face attributes in the wild,” in Proceedings of International Conference on Computer Vision (ICCV), 2015.
  39. B. Amos, B. Ludwiczuk, and M. Satyanarayanan, “Openface: A general-purpose face recognition library with mobile applications,” CMU-CS-16-118, CMU School of Computer Science, Tech. Rep., 2016.
  40. R. Gross, I. Matthews, J. Cohn, T. Kanade, and S. Baker, “Multi-pie,” Image Vision Comput., vol. 28, no. 5, pp. 807–813, May 2010.
  41. P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, and I. Matthews, “The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression,” in Computer Vision and Pattern Recognition Workshops (CVPRW), 2010 IEEE Computer Society Conference on.   IEEE, 2010, pp. 94–101.
  42. L. Mescheder, A. Geiger, and S. Nowozin, “Which training methods for gans do actually converge?” arXiv preprint arXiv:1801.04406, 2018.
  43. A. Krizhevsky and G. Hinton, “Learning multiple layers of features from tiny images,” 2009.
  44. Z. Zhang, P. Luo, C. C. Loy, and X. Tang, “Learning deep representation for face alignment with auxiliary attributes,” IEEE transactions on pattern analysis and machine intelligence, vol. 38, no. 5, pp. 918–930, 2015.
  45. J. Thewlis, H. Bilen, and A. Vedaldi, “Unsupervised learning of object frames by dense equivariant image labelling,” in Advances in Neural Information Processing Systems, 2017, pp. 844–855.
  46. Z. Zhang, P. Luo, C. C. Loy, and X. Tang, “Facial landmark detection by deep multi-task learning,” in European conference on computer vision.   Springer, 2014, pp. 94–108.
  47. N. Aifanti, C. Papachristou, and A. Delopoulos, “The mug facial expression database,” in 11th International Workshop on Image Analysis for Multimedia Interactive Services WIAMIS 10, April 2010, pp. 1–4.
  48. S. Tulyakov, M.-Y. Liu, X. Yang, and J. Kautz, “Mocogan: Decomposing motion and content for video generation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 1526–1535.
  49. D. Sen, S. Datta, and R. Balasubramanian, “Facial emotion classification using concatenated geometric and textural features,” Multimedia Tools and Applications, vol. 78, no. 8, pp. 10 287–10 323, 2019.
  50. R. Weber, V. Barrielle, C. Soladie, and R. Seguier, “Unsupervised adaptation of a person-specific manifold of facial expressions,” IEEE Transactions on Affective Computing, 2018.
  51. M. Verma, S. K. Vipparthi, and G. Singh, “Hinet: Hybrid inherited feature learning network for facial expression recognition,” IEEE Letters of the Computer Society, vol. 2, no. 4, pp. 36–39, 2019.
  52. J. A. Aghamaleki and V. A. Chenarlogh, “Multi-stream cnn for facial expression recognition in limited training data,” Multimedia Tools and Applications, pp. 1–22, 2019.
  53. M. Mandal, M. Verma, S. Mathur, S. K. Vipparthi, S. Murala, and D. K. Kumar, “Regional adaptive affinitive patterns (radap) with logical operators for facial expression recognition,” IET Image Processing, vol. 13, no. 5, pp. 850–861, 2019.
  54. K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1026–1034.
Citations (27)

Summary

We haven't generated a summary for this paper yet.