BigGait: Learning Gait Representation You Want by Large Vision Models (2402.19122v2)
Abstract: Gait recognition stands as one of the most pivotal remote identification technologies and progressively expands across research and industry communities. However, existing gait recognition methods heavily rely on task-specific upstream driven by supervised learning to provide explicit gait representations like silhouette sequences, which inevitably introduce expensive annotation costs and potential error accumulation. Escaping from this trend, this work explores effective gait representations based on the all-purpose knowledge produced by task-agnostic Large Vision Models (LVMs) and proposes a simple yet efficient gait framework, termed BigGait. Specifically, the Gait Representation Extractor (GRE) within BigGait draws upon design principles from established gait representations, effectively transforming all-purpose knowledge into implicit gait representations without requiring third-party supervision signals. Experiments on CCPG, CAISA-B* and SUSTech1K indicate that BigGait significantly outperforms the previous methods in both within-domain and cross-domain tasks in most cases, and provides a more practical paradigm for learning the next-generation gait representation. Finally, we delve into prospective challenges and promising directions in LVMs-based gait recognition, aiming to inspire future work in this emerging topic. The source code is available at https://github.com/ShiqiYu/OpenGait.
- Language models are few-shot learners, 2020.
- Realtime multi-person 2d pose estimation using part affinity fields. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7291–7299, 2017.
- Gaitset: Cross-view gait recognition through utilizing gait as a deep set. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(7):3467–3478, 2022.
- Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European conference on computer vision (ECCV), pages 801–818, 2018.
- A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning, pages 1597–1607. PMLR, 2020.
- Exploring simple siamese representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15750–15758, 2021.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- Metagait: Learning to learn an omni sample adaptive representation for gait recognition. In European Conference on Computer Vision, pages 357–374. Springer, 2022.
- Gaitpart: Temporal part-based model for gait recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14225–14233, 2020.
- Exploring deep models for practical gait recognition. arXiv preprint arXiv:2303.03301, 2023a.
- Learning gait representation from massive unlabelled walking videos: A benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023b.
- Opengait: Revisiting gait recognition towards better practicality. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9707–9716, 2023c.
- Skeletongait: Gait recognition using skeleton maps. arXiv preprint arXiv:2311.13444, 2023d.
- Gpgait: Generalized pose-based gait recognition. arXiv preprint arXiv:2303.05234, 2023.
- Bottom-up human pose estimation via disentangled keypoint regression. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14676–14686, 2021.
- Self-supervised pretraining of visual features in the wild. arXiv preprint arXiv:2103.01988, 2021.
- Appearance-preserving 3d convolution for video-based person re-identification. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, pages 228–243. Springer, 2020.
- Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009, 2022.
- Unet 3+: A full-scale connected unet for medical image segmentation. In ICASSP 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), pages 1055–1059. IEEE, 2020.
- Context-sensitive temporal feature learning for gait recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12909–12918, 2021a.
- 3d local convolutional neural networks for gait recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 14920–14929, 2021b.
- Segment anything, 2023.
- Self-correction for human parsing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(6):3260–3271, 2020a.
- An in-depth exploration of person re-identification and gait recognition in cloth-changing conditions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13824–13833, 2023.
- Selective kernel networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 510–519, 2019.
- End-to-end model-based gait recognition. In Proceedings of the Asian Conference on Computer Vision, 2020b.
- Gaitedge: Beyond plain end-to-end gait recognition for better practicality. In Computer Vision – ECCV 2022, 2022.
- Pose-based temporal-spatial network (ptsn) for gait recognition with carrying and clothing variations. In Chinese conference on biometric recognition, pages 474–483. Springer, 2017.
- Gait recognition via effective global-local feature representation and local temporal aggregation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14648–14656, 2021.
- Cdgnet: Class distribution guided network for human parsing. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4473–4482, 2022.
- Smpl: A skinned multi-person linear model. In Seminal Graphics Papers: Pushing the Boundaries, Volume 2, pages 851–866. 2023.
- Automatic recognition by gait. Proceedings of the IEEE, 94(11):2013–2024, 2006.
- Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023.
- Learning rich features for gait recognition by integrating skeletons and silhouettes. Multimedia Tools and Applications, pages 1–22, 2023.
- Learning transferable visual models from natural language supervision, 2021.
- Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017.
- Lidargait: Benchmarking 3d gait recognition with point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1054–1063, 2023.
- A 3x3 isotropic gradient operator for image processing. a talk at the Stanford Artificial Project in, pages 271–272, 1968.
- Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5693–5703, 2019.
- Multi-view large population gait dataset and its performance evaluation for cross-view gait recognition. IPSJ Transactions on Computer Vision and Applications, 10, 2018.
- Gaitgraph: graph convolutional network for skeleton-based gait recognition. In 2021 IEEE International Conference on Image Processing (ICIP), pages 2314–2318. IEEE, 2021.
- Towards a deeper understanding of skeleton-based gait recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1569–1577, 2022.
- Deep high-resolution representation learning for visual recognition. IEEE transactions on pattern analysis and machine intelligence, 43(10):3349–3364, 2020.
- Pyramid spatial-temporal aggregation for video-based person re-identification. In Proceedings of the IEEE/CVF international conference on computer vision, pages 12026–12035, 2021.
- A framework for evaluating the effect of view angle, clothing and carrying condition on gait recognition. In 18th International Conference on Pattern Recognition (ICPR’06), pages 441–444. IEEE, 2006.
- Multidirection and multiscale pyramid in transformer for video-based pedestrian retrieval. IEEE Transactions on Industrial Informatics, 18(12):8776–8785, 2022.
- Spatial transformer network on skeleton-based gait recognition. Expert Systems, page e13244, 2023.
- Gait recognition via disentangled representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4710–4719, 2019.
- Gait recognition in the wild with dense 3d representations and a benchmark. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- Parsing is all you need for accurate gait recognition in the wild. arXiv preprint arXiv:2308.16739, 2023.