Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Self-Supervised Video Similarity Learning (2304.03378v2)

Published 6 Apr 2023 in cs.CV and cs.LG

Abstract: We introduce S$2$VS, a video similarity learning approach with self-supervision. Self-Supervised Learning (SSL) is typically used to train deep models on a proxy task so as to have strong transferability on target tasks after fine-tuning. Here, in contrast to prior work, SSL is used to perform video similarity learning and address multiple retrieval and detection tasks at once with no use of labeled data. This is achieved by learning via instance-discrimination with task-tailored augmentations and the widely used InfoNCE loss together with an additional loss operating jointly on self-similarity and hard-negative similarity. We benchmark our method on tasks where video relevance is defined with varying granularity, ranging from video copies to videos depicting the same incident or event. We learn a single universal model that achieves state-of-the-art performance on all tasks, surpassing previously proposed methods that use labeled data. The code and pretrained models are publicly available at: https://github.com/gkordo/s2vs

Definition Search Book Streamline Icon: https://streamlinehq.com
References (83)
  1. Learning to see by moving. In ICCV, 2015.
  2. Self-labelling via simultaneous clustering and representation learning. In ICLR, 2019.
  3. Beit: Bert pre-training of image transformers. In ICLR, 2021.
  4. LAMV: Learning to align and match videos with kernelized temporal layers. In CVPR, 2018.
  5. TARN: Temporal attentive relation network for few-shot and zero-shot action recognition. In BMVC, 2019.
  6. Million-scale near-duplicate video retrieval system. In ACM MM, 2011.
  7. Deep clustering for unsupervised learning of visual features. In ECCV, 2018.
  8. Unsupervised learning of visual features by contrasting cluster assignments. NeurIPS, 2020.
  9. Emerging properties in self-supervised vision transformers. In ICCV, 2021.
  10. A simple framework for contrastive learning of visual representations. In ICML, 2020.
  11. Exploring simple siamese representation learning. In CVPR, 2021.
  12. Pattern-based near-duplicate video retrieval and localization on web-scale videos. IEEE TMM, 2015.
  13. Total recall: Automatic query expansion with a generative feature model for object retrieval. In ICCV, 2007.
  14. RandAugment: Practical automated data augmentation with a reduced search space. In CVPRW, 2020.
  15. Imagenet: A large-scale hierarchical image database. In CVPR, 2009.
  16. Unsupervised visual representation learning by context prediction. In ICCV, 2015.
  17. Multi-task self-supervised visual learning. In ICCV, 2017.
  18. An image-based approach to video copy detection with spatio-temporal post-filtering. IEEE TMM, 2010.
  19. The 2021 image similarity dataset and challenge. In arXiv:2106.09672, 2021.
  20. Video re-localization. In ECCV, 2018.
  21. Self-supervised video representation learning with odd-one-out networks. In CVPR, 2017.
  22. ER3: A unified framework for event retrieval, recognition and recounting. In CVPR, 2017.
  23. Bootstrap your own latent-a new approach to self-supervised learning. NeurIPS, 2020.
  24. Self-supervised co-training for video representation learning. NeurIPS, 2020.
  25. Video similarity and alignment learning on partial video copy detection. In ACM MM, 2021.
  26. Masked autoencoders are scalable vision learners. In CVPR, 2022.
  27. Momentum contrast for unsupervised visual representation learning. In CVPR, 2020.
  28. Deep residual learning for image recognition. In CVPR, 2016.
  29. A large-scale comprehensive dataset and copy-overlap aware evaluation protocol for segment-level video copy detection. In CVPR, 2022.
  30. TransVCL: Attention-enhanced video copy localization network with flexible supervision. In AAAI, 2023.
  31. Learn from unlabeled videos for near-duplicate video retrieval. In ACM SIGIR, 2022.
  32. Olivier Henaff. Data-efficient image recognition with contrastive predictive coding. In ICML, 2020.
  33. Long short-term memory. Neural computation, 1997.
  34. Spatial color indexing and applications. IJCV, 1999.
  35. Practical online near-duplicate subsequence detection for continuous video streams. IEEE TMM, 2010.
  36. Learning image representations tied to ego-motion. In ICCV, 2015.
  37. Learning segment similarity and alignment in large-scale content based video retrieval. In ACM MM, 2021.
  38. VCDB: A large-scale database for partial copy detection in videos. In ECCV, 2014.
  39. Partial copy detection in videos: A benchmark and an evaluation of popular methods. IEEE TBD, 2016.
  40. Self-supervised video representation learning with space-time cubic puzzles. In AAAI, 2019.
  41. Near-duplicate video retrieval by aggregating intermediate cnn layers. In MMM, 2017.
  42. Near-duplicate video retrieval with deep metric learning. In ICCVW, 2017.
  43. FIVR: Fine-grained Incident Video Retrieval. IEEE TMM, 2019.
  44. ViSiL: Fine-grained spatio-temporal video similarity learning. In ICCV, 2019.
  45. DnS: Distill-and-Select for Efficient and Accurate Video Indexing and Retrieval. IJCV, 2022.
  46. Video copy detection: a comparative study. In ACM CIVR, 2007.
  47. Large scale video representation learning via relational graph clustering. In CVPR, 2020.
  48. Collaborative deep metric learning for video understanding. In ACM SIGKDD, 2018.
  49. Dual-stream knowledge-preserving hashing for unsupervised video retrieval. In ECCV, 2022.
  50. Self-supervised video hashing via bidirectional transformers. In CVPR, 2021.
  51. SGDR: Stochastic gradient descent with warm restarts. ICLR, 2016.
  52. Decoupled weight decay regularization. In ICLR, 2018.
  53. Shuffle and learn: unsupervised learning using temporal order verification. In ECCV, 2016.
  54. VRAG: Region attention graphs for content-based video retrieval. In arXiv:2205.09068, 2022.
  55. Unsupervised learning of visual representations by solving jigsaw puzzles. In ECCV, 2016.
  56. A comparative study of texture measures with classification based on featured distributions. PR, 1996.
  57. Representation learning with contrastive predictive coding. In arXiv:1807.03748, 2018.
  58. PyTorch: An imperative style, high-performance deep learning library. In NeurIPS, 2019.
  59. Context encoders: Feature learning by inpainting. In CVPR, 2016.
  60. A self-supervised descriptor for image copy detection. In CVPR, 2022.
  61. Temporal matching kernel with explicit feature maps. In ACM MM, 2015.
  62. Spatiotemporal contrastive video representation learning. In CVPR, 2021.
  63. Broaden your views for self-supervised video learning. In ICCV, 2021.
  64. Event retrieval in large video collections with circulant temporal encoding. In CVPR, 2013.
  65. Spreading vectors for similarity search. In ICLR, 2018.
  66. Real-time large scale near-duplicate web video retrieval. In ACM MM, 2010.
  67. Temporal context aggregation for video retrieval with contrastive learning. In WACV, 2021.
  68. Video Google: A text retrieval approach to object matching in videos. In CVPR, 2003.
  69. Multiple feature hashing for real-time large scale near-duplicate video retrieval. In ACM MM, 2011.
  70. Self-supervised video hashing with hierarchical binary auto-encoder. IEEE TIP, 2018.
  71. Scalable detection of partial near-duplicate videos by visual-temporal consistency. In ACM MM, 2009.
  72. Contrastive multiview coding. In ECCV, 2020.
  73. Particular object retrieval with integral max-pooling of cnn activations. In ICLR, 2016.
  74. Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In NeurIPS, 2022.
  75. Attention is all you need. In NeurIPS, 2017.
  76. Attention-based deep metric learning for near-duplicate video retrieval. In ICPR, 2021.
  77. Compact cnn based video representation for efficient video copy detection. In MMM, 2017.
  78. Bevt: Bert pretraining of video transformers. In CVPR, 2022.
  79. Practical elimination of near-duplicates from web video search. In ACM MM, 2007.
  80. Hierarchical attention networks for document classification. In nAC-ACL, 2016.
  81. Central similarity quantization for efficient image and video retrieval. In CVPR, 2020.
  82. Cutmix: Regularization strategy to train strong classifiers with localizable features. In ICCV, 2019.
  83. Colorful image colorization. In ECCV, 2016.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Giorgos Kordopatis-Zilos (18 papers)
  2. Giorgos Tolias (43 papers)
  3. Christos Tzelepis (24 papers)
  4. Ioannis Kompatsiaris (42 papers)
  5. Ioannis Patras (73 papers)
  6. Symeon Papadopoulos (74 papers)
Citations (7)

Summary

We haven't generated a summary for this paper yet.