Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Unsupervised Landmark Discovery Using Consistency Guided Bottleneck (2309.10518v1)

Published 19 Sep 2023 in cs.CV

Abstract: We study a challenging problem of unsupervised discovery of object landmarks. Many recent methods rely on bottlenecks to generate 2D Gaussian heatmaps however, these are limited in generating informed heatmaps while training, presumably due to the lack of effective structural cues. Also, it is assumed that all predicted landmarks are semantically relevant despite having no ground truth supervision. In the current work, we introduce a consistency-guided bottleneck in an image reconstruction-based pipeline that leverages landmark consistency, a measure of compatibility score with the pseudo-ground truth to generate adaptive heatmaps. We propose obtaining pseudo-supervision via forming landmark correspondence across images. The consistency then modulates the uncertainty of the discovered landmarks in the generation of adaptive heatmaps which rank consistent landmarks above their noisy counterparts, providing effective structural information for improved robustness. Evaluations on five diverse datasets including MAFL, AFLW, LS3D, Cats, and Shoes demonstrate excellent performance of the proposed approach compared to the existing state-of-the-art methods. Our code is publicly available at https://github.com/MamonaAwan/CGB_ULD.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. Brul\\\backslash\e: Barycenter-regularized unsupervised landmark extraction. arXiv preprint arXiv:2006.11643, 2020.
  2. How far are we from solving the 2d & 3d face alignment problem?(and a dataset of 230,000 3d facial landmarks). In Proceedings of the IEEE ICCV, pages 1021–1030, 2017.
  3. Deep clustering for unsupervised learning of visual features. In Proceedings of the European Conference on Computer Vision (ECCV), pages 132–149, 2018.
  4. On equivariant and invariant learning of object landmark representations. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9897–9906, 2021.
  5. Style aggregated network for facial landmark detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 379–388, 2018.
  6. Ms-celeb-1m: A dataset and benchmark for large-scale face recognition, 2016.
  7. Pose-guided photorealistic face rotation. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8398–8406, 2018. 10.1109/CVPR.2018.00876.
  8. Image-to-image translation with conditional adversarial networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5967–5976, 2017. 10.1109/CVPR.2017.632.
  9. Unsupervised learning of object landmarks through conditional image generation. NeurIPS, pages 13520–13531, 2018.
  10. Self-supervised learning of interpretable keypoints from unlabelled videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8787–8797, 2020.
  11. Perceptual losses for real-time style transfer and super-resolution. In European conference on computer vision, pages 694–711. Springer, 2016.
  12. Synergy between face alignment and tracking via discriminative global consensus optimization. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 3811–3819. IEEE, 2017.
  13. Animalweb: A large-scale hierarchical dataset of annotated animal faces. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6939–6948, 2020.
  14. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations (ICLR), 2017.
  15. Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization. In 2011 IEEE ICCV workshops, pages 2144–2151. IEEE, 2011.
  16. Luvli face alignment: Estimating landmarks’ location, uncertainty, and visibility likelihood. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8236–8246, 2020.
  17. Unsupervised visual representation learning by graph-based consistent constraints. In European Conference on Computer Vision, pages 678–694. Springer, 2016.
  18. Learning deep parsimonious representations. In Proceedings of the 30th International Conference on Neural Information Processing Systems, pages 5083–5091, 2016.
  19. Deep learning face attributes in the wild. In Proceedings of the IEEE international conference on computer vision, pages 3730–3738, 2015.
  20. Unsupervised part-based disentangling of object shape and appearance. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10955–10964, 2019.
  21. Pose guided person image generation. 2018.
  22. Unsupervised learning of object landmarks via self-training correspondence. Advances in Neural Information Processing Systems, 33, 2020.
  23. Direct shape regression networks for end-to-end face alignment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5040–5049, 2018.
  24. Stacked hourglass networks for human pose estimation. In European conference on computer vision, pages 483–499. Springer, 2016.
  25. Boosting self-supervised learning via knowledge transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 9359–9367, 2018.
  26. Lifting autoencoders: Unsupervised learning of a fully-disentangled 3d morphable model using deep non-rigid structure from motion. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pages 0–0, 2019.
  27. Object landmark discovery through unsupervised adaptation. NeurIPS, 32:13520–13531, 2019.
  28. Deforming autoencoders: Unsupervised disentangling of shape and appearance. In Proceedings of the European conference on computer vision (ECCV), pages 650–665, 2018.
  29. Discovery of latent 3d keypoints via end-to-end geometric reasoning. arXiv preprint arXiv:1807.03146, 2018.
  30. Unsupervised learning of object landmarks by factorized spatial embeddings. In Proceedings of the IEEE international conference on computer vision, pages 5916–5925, 2017a.
  31. Unsupervised learning of object frames by dense equivariant image labelling. arXiv preprint arXiv:1706.02932, 2017b.
  32. Unsupervised learning of landmarks by descriptor vector exchange. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6361–6371, 2019.
  33. Adaptive wing loss for robust face alignment via heatmap regression. In Proceedings of the IEEE/CVF international conference on computer vision, pages 6971–6981, 2019.
  34. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4):600–612, 2004.
  35. Self-supervised learning of a facial attribute embedding from video. arXiv preprint arXiv:1808.06882, 2018.
  36. Unsupervised landmark learning from unpaired data. arXiv preprint arXiv:2007.01053, 2020.
  37. Joint unsupervised learning of deep representations and image clusters. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5147–5156, 2016.
  38. Learning to cluster faces via confidence and connectivity estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020.
  39. Lift: Learned invariant feature transform. In European conference on computer vision, pages 467–483. Springer, 2016.
  40. Fine-grained visual comparisons with local learning. In Proceedings of the IEEE CVPR, pages 192–199, 2014.
  41. Semantic jitter: Dense supervision for visual comparisons via synthetic images. In Proc. of IEEE ICCV, pages 5570–5579, 2017.
  42. Cat head detection-how to effectively exploit shape and texture features. In ECCV, pages 802–816. Springer, 2008.
  43. Unsupervised discovery of object landmarks as structural representations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2694–2703, 2018.
  44. Facial landmark detection by deep multi-task learning. In ECCV, pages 94–108. Springer, 2014.
  45. Learning deep representation for face alignment with auxiliary attributes. IEEE transactions on pattern analysis and machine intelligence, 38(5):918–930, 2015.
  46. Face alignment across large poses: A 3d solution. CoRR, abs/1511.07212, 2015. URL http://arxiv.org/abs/1511.07212.
Citations (1)

Summary

We haven't generated a summary for this paper yet.