Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Hierarchical Consistent Contrastive Learning for Skeleton-Based Action Recognition with Growing Augmentations (2211.13466v3)

Published 24 Nov 2022 in cs.CV

Abstract: Contrastive learning has been proven beneficial for self-supervised skeleton-based action recognition. Most contrastive learning methods utilize carefully designed augmentations to generate different movement patterns of skeletons for the same semantics. However, it is still a pending issue to apply strong augmentations, which distort the images/skeletons' structures and cause semantic loss, due to their resulting unstable training. In this paper, we investigate the potential of adopting strong augmentations and propose a general hierarchical consistent contrastive learning framework (HiCLR) for skeleton-based action recognition. Specifically, we first design a gradual growing augmentation policy to generate multiple ordered positive pairs, which guide to achieve the consistency of the learned representation from different views. Then, an asymmetric loss is proposed to enforce the hierarchical consistency via a directional clustering operation in the feature space, pulling the representations from strongly augmented views closer to those from weakly augmented views for better generalizability. Meanwhile, we propose and evaluate three kinds of strong augmentations for 3D skeletons to demonstrate the effectiveness of our method. Extensive experiments show that HiCLR outperforms the state-of-the-art methods notably on three large-scale datasets, i.e., NTU60, NTU120, and PKUMMD.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. Directional Self-Supervised Learning for Heavy Image Augmentations. In IEEE CVPR, 16692–16701.
  2. A simple framework for contrastive learning of visual representations. In ICML, 1597–1607.
  3. Improved baselines with momentum contrastive learning. arXiv:2003.04297.
  4. Exploring simple siamese representation learning. In IEEE CVPR, 15750–15758.
  5. Channel-wise topology refinement graph convolution for skeleton-based action recognition. In IEEE ICCV, 13359–13368.
  6. Hierarchically Self-Supervised Transformer for Human Skeleton Representation Learning. arXiv:2207.09644.
  7. Skeleton-based action recognition with shift graph convolutional network. In IEEE CVPR, 183–192.
  8. Hierarchical transformer: Unsupervised representation learning for skeleton-based human action recognition. In IEEE ICME, 1–6.
  9. Randaugment: Practical automated data augmentation with a reduced search space. In IEEE CVPR Workshops, 702–703.
  10. Hierarchical recurrent neural network for skeleton based action recognition. In IEEE CVPR, 1110–1118.
  11. Bootstrap your own latent-a new approach to self-supervised learning. NeurIPS, 33: 21271–21284.
  12. Contrastive Learning from Extremely Augmented Skeleton Sequences for Self-supervised Action Recognition. In AAAI.
  13. Momentum contrast for unsupervised visual representation learning. In IEEE CVPR, 9729–9738.
  14. Arbitrary style transfer in real-time with adaptive instance normalization. In IEEE ICCV, 1501–1510.
  15. A style-based generator architecture for generative adversarial networks. In IEEE CVPR, 4401–4410.
  16. A new representation of skeleton sequences for 3d action recognition. In IEEE CVPR, 3288–3297.
  17. Global-local Motion Transformer for Unsupervised Skeleton-based Action Learning. arXiv:2207.06101.
  18. On information and sufficiency. The Annals of Mathematical Statistics, 22(1): 79–86.
  19. 3d human action representation learning via cross-view consistency pursuit. In IEEE CVPR, 4741–4750.
  20. Ms2l: Multi-task self-supervised learning for skeleton based action recognition. In ACM MM, 2490–2498.
  21. Ntu rgb+ d 120: A large-scale benchmark for 3d human activity understanding. IEEE TPAMI, 42(10): 2684–2701.
  22. A benchmark dataset and comparison study for multi-modal human action analytics. ACM TOMM, 16(2): 1–24.
  23. Enhanced skeleton visualization for view invariant human action recognition. Pattern Recognition, 68: 346–362.
  24. Disentangling and unifying graph convolutions for skeleton-based action recognition. In IEEE CVPR, 143–152.
  25. Unsupervised 3d human pose representation with viewpoint and pose disentanglement. In ECCV, 102–118.
  26. Representation learning with contrastive predictive coding. arXiv:1807.03748.
  27. Skeleton-based action recognition via spatial and temporal transformer networks. CVIU, 208: 103219.
  28. Augmented skeleton based contrastive action learning with momentum lstm for unsupervised action recognition. Information Sciences, 569: 90–109.
  29. Multimodal human action recognition in assistive human-robot interaction. In IEEE ICASSP, 2702–2706.
  30. Ntu rgb+ d: A large scale dataset for 3d human activity analysis. In IEEE CVPR, 1010–1019.
  31. Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In IEEE CVPR, 12026–12035.
  32. Decoupled spatial-temporal attention network for skeleton-based action recognition. arXiv:1709.04875.
  33. Real-time human pose recognition in parts from single depth images. In IEEE CVPR, 1297–1304.
  34. Adversarial self-supervised learning for semi-supervised 3d action recognition. In ECCV, 35–51.
  35. Predict & cluster: Unsupervised skeleton based action recognition. In IEEE CVPR, 9631–9640.
  36. Self-supervised 3d skeleton action representation learning with motion consistency and continuity. In IEEE ICCV, 13328–13338.
  37. Uncertainty-aware score distribution learning for action quality assessment. In IEEE CVPR, 9839–9848.
  38. Skeleton-contrastive 3D action representation learning. In ACM MM, 1655–1663.
  39. What makes for good views for contrastive learning? NeurIPS, 33: 6827–6839.
  40. Contrastive learning with stronger augmentations. arXiv:2104.07713.
  41. Co2: Consistent contrast for unsupervised visual representation learning. arXiv:2010.02217.
  42. Unsupervised feature learning via non-parametric instance discrimination. In IEEE CVPR, 3733–3742.
  43. Spatial temporal graph convolutional networks for skeleton-based action recognition. In AAAI, 7444–7452.
  44. Skeleton cloud colorization for unsupervised 3d action representation learning. In IEEE ICCV, 13423–13433.
  45. Rethinking the Augmentation Module in Contrastive Learning: Learning Hierarchical Augmentation Invariance with Expanded Views. In IEEE CVPR, 16650–16659.
  46. Unsupervised representation learning with long-term dynamics for skeleton based action recognition. In AAAI, 2644–2651.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Jiahang Zhang (8 papers)
  2. Lilang Lin (11 papers)
  3. Jiaying Liu (99 papers)
Citations (36)

Summary

We haven't generated a summary for this paper yet.