Graph Convolution Based Efficient Re-Ranking for Visual Retrieval (2306.08792v1)
Abstract: Visual retrieval tasks such as image retrieval and person re-identification (Re-ID) aim at effectively and thoroughly searching images with similar content or the same identity. After obtaining retrieved examples, re-ranking is a widely adopted post-processing step to reorder and improve the initial retrieval results by making use of the contextual information from semantically neighboring samples. Prevailing re-ranking approaches update distance metrics and mostly rely on inefficient crosscheck set comparison operations while computing expanded neighbors based distances. In this work, we present an efficient re-ranking method which refines initial retrieval results by updating features. Specifically, we reformulate re-ranking based on Graph Convolution Networks (GCN) and propose a novel Graph Convolution based Re-ranking (GCR) for visual retrieval tasks via feature propagation. To accelerate computation for large-scale retrieval, a decentralized and synchronous feature propagation algorithm which supports parallel or distributed computing is introduced. In particular, the plain GCR is extended for cross-camera retrieval and an improved feature propagation formulation is presented to leverage affinity relationships across different cameras. It is also extended for video-based retrieval, and Graph Convolution based Re-ranking for Video (GCRV) is proposed by mathematically deriving a novel profile vector generation method for the tracklet. Without bells and whistles, the proposed approaches achieve state-of-the-art performances on seven benchmark datasets from three different tasks, i.e., image retrieval, person Re-ID and video-based person Re-ID.
- T. Mei, Y. Rui, S. Li, and Q. Tian, “Multimedia search reranking: A literature survey,” ACM Computing Surveys (CSUR), vol. 46, no. 3, pp. 1–38, 2014.
- D. Tao et al., “Visual reranking: From objectives to strategies,” IEEE MultiMedia, vol. 18, no. 3, pp. 12–21, 2011.
- Z. Zhong, L. Zheng, D. Cao, and S. Li, “Re-ranking person re-identification with k-reciprocal encoding,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 2017, pp. 1318–1327.
- M. Saquib Sarfraz, A. Schumann, A. Eberle, and R. Stiefelhagen, “A pose-sensitive embedding for person re-identification with expanded cross neighborhood re-ranking,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE, 2018, pp. 420–429.
- O. Chum, J. Philbin, J. Sivic, M. Isard, and A. Zisserman, “Total recall: Automatic query expansion with a generative feature model for object retrieval,” in IEEE International Conference on Computer Vision. Rio de Janeiro, Brazil: IEEE, 2007, pp. 1–8.
- Y. Shen, H. Li, T. Xiao, S. Yi, D. Chen, and X. Wang, “Deep group-shuffling random walk for person re-identification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE, 2018, pp. 2265–2274.
- Y. Shen, H. Li, S. Yi, D. Chen, and X. Wang, “Person re-identification with deep similarity-guided graph neural network,” in Proceedings of the European conference on computer vision (ECCV). Munich, Germany: Springer, 2018, pp. 486–504.
- Y. Wu, O. E. F. Bourahla, X. Li, F. Wu, Q. Tian, and X. Zhou, “Adaptive graph representation learning for video person re-identification,” IEEE Transactions on Image Processing, vol. 29, pp. 8821–8830, 2020.
- X. Zhang, M. Jiang, Z. Zheng, X. Tan, E. Ding, and Y. Yang, “Understanding image retrieval re-ranking: A graph neural network perspective,” arXiv preprint arXiv:2012.07620, 2020.
- Y. Zhang, Q. Qian, C. Liu, W. Chen, F. Wang, H. Li, and R. Jin, “Graph convolution for re-ranking in person re-identification,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 2704–2708.
- Y. Sun, L. Zheng, W. Deng, and S. Wang, “Svdnet for pedestrian retrieval,” in Proceedings of the IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017, pp. 3800–3808.
- Z. Zheng, L. Zheng, and Y. Yang, “A discriminatively learned cnn embedding for person reidentification,” ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), vol. 14, no. 1, pp. 1–20, 2017.
- W. Chen, X. Chen, J. Zhang, and K. Huang, “A multi-task deep network for person re-identification,” in Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence. San Francisco, California, USA: AAAI, 2017, pp. 3988–3994.
- H. Luo, Y. Gu, X. Liao, S. Lai, and W. Jiang, “Bag of tricks and a strong baseline for deep person re-identification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. Long Beach, CA, USA: IEEE, 2019, pp. 0–0.
- H. Luo, W. Jiang, Y. Gu, F. Liu, X. Liao, S. Lai, and J. Gu, “A strong baseline and batch normalization neck for deep person re-identification,” IEEE Transactions on Multimedia, vol. 22, no. 10, pp. 2597–2609, 2019.
- Y. Sun, L. Zheng, Y. Yang, Q. Tian, and S. Wang, “Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline),” in Proceedings of the European Conference on Computer Vision (ECCV). Munich, Germany: Springer, 2018, pp. 480–496.
- Y. Suh, J. Wang, S. Tang, T. Mei, and K. Mu Lee, “Part-aligned bilinear representations for person re-identification,” in Proceedings of the European Conference on Computer Vision (ECCV). Munich, Germany: Springer, 2018, pp. 402–419.
- D. Li, X. Chen, Z. Zhang, and K. Huang, “Learning deep context-aware features over body and latent parts for person re-identification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 2017, pp. 384–393.
- X. Zhang, H. Luo, X. Fan, W. Xiang, Y. Sun, Q. Xiao, W. Jiang, C. Zhang, and J. Sun, “Alignedreid: Surpassing human-level performance in person re-identification,” arXiv preprint arXiv:1711.08184, vol. arXiv, no. preprint, pp. 1–1, 2017.
- Y. Zhang, Y. Huang, S. Yu, and L. Wang, “Cross-view gait recognition by discriminative feature learning,” IEEE Transactions on Image Processing, vol. 29, pp. 1001–1015, 2019.
- W. Liu, Y. Wen, Z. Yu, M. Li, B. Raj, and L. Song, “Sphereface: Deep hypersphere embedding for face recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 2017, pp. 212–220.
- H. Wang, Y. Wang, Z. Zhou, X. Ji, D. Gong, J. Zhou, Z. Li, and W. Liu, “Cosface: Large margin cosine loss for deep face recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE, 2018, pp. 5265–5274.
- J. Deng, J. Guo, N. Xue, and S. Zafeiriou, “Arcface: Additive angular margin loss for deep face recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, CA, USA: IEEE, 2019, pp. 4690–4699.
- R. Hadsell, S. Chopra, and Y. LeCun, “Dimensionality reduction by learning an invariant mapping,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2. New York, NY, USA: IEEE, 2006, pp. 1735–1742.
- F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: A unified embedding for face recognition and clustering,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE, 2015, pp. 815–823.
- Z. Zhong, L. Zheng, G. Kang, S. Li, and Y. Yang, “Random erasing data augmentation.” in AAAI. New York, NY, USA: AAAI, 2020, pp. 13 001–13 008.
- N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” The Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014.
- Z. Yu, Y. Zhao, B. Hong, Z. Jin, J. Huang, D. Cai, and X.-S. Hua, “Apparel-invariant feature learning for person re-identification,” IEEE Transactions on Multimedia, vol. 24, pp. 4482–4492, 2021.
- C. Luo, Y. Chen, N. Wang, and Z. Zhang, “Spectral feature transformation for person re-identification,” in Proceedings of the IEEE International Conference on Computer Vision. Seoul, Korea: IEEE, 2019, pp. 4976–4985.
- H. Jegou, H. Harzallah, and C. Schmid, “A contextual dissimilarity measure for accurate and efficient image search,” in IEEE Conference on Computer Vision and Pattern Recognition. Minneapolis, MN, USA: IEEE, 2007, pp. 1–8.
- D. Qin, S. Gammeter, L. Bossard, T. Quack, and L. Van Gool, “Hello neighbor: Accurate object retrieval with k-reciprocal nearest neighbors,” in IEEE Conference on Computer Vision and Pattern Recognition. Colorado Springs, CO, USA: IEEE, 2011, pp. 777–784.
- J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, “Object retrieval with large vocabularies and fast spatial matching,” in IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2007, pp. 1–8.
- O. Siméoni, Y. Avrithis, and O. Chum, “Local features and visual words emerge in activations,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 11 651–11 660.
- B. Cao, A. Araujo, and J. Sim, “Unifying deep local and global features for image search,” in European Conference on Computer Vision. Springer, 2020, pp. 726–743.
- P.-E. Sarlin, D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superglue: Learning feature matching with graph neural networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 4938–4947.
- F. Tan, J. Yuan, and V. Ordonez, “Instance-level image retrieval using reranking transformers,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 12 105–12 115.
- Y. Zhu, W. Xu, J. Zhang, Y. Du, J. Zhang, Q. Liu, C. Yang, and S. Wu, “A survey on graph structure learning: Progress and opportunities,” arXiv e-prints, pp. arXiv–2103, 2021.
- W. Jin, Y. Ma, X. Liu, X. Tang, S. Wang, and J. Tang, “Graph structure learning for robust graph neural networks,” in ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2020, pp. 66–74.
- P. Ji, T. Zhang, H. Li, M. Salzmann, and I. Reid, “Deep subspace clustering networks,” Advances in neural information processing systems, vol. 30, 2017.
- T. Zhang, P. Ji, M. Harandi, W. Huang, and H. Li, “Neural collaborative subspace clustering,” in International Conference on Machine Learning. PMLR, 2019, pp. 7384–7393.
- T. Zhang, P. Ji, M. Harandi, R. Hartley, and I. Reid, “Scalable deep k-subspace clustering,” in Asian Conference on Computer Visio. Springer, 2019, pp. 466–481.
- T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” in International Conference on Learning Representations. Toulon, France: OpenReview.net, 2016, pp. 1–1.
- L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang, and Q. Tian, “Scalable person re-identification: A benchmark,” in Proceedings of the IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015, pp. 1116–1124.
- E. Ristani, F. Solera, R. Zou, R. Cucchiara, and C. Tomasi, “Performance measures and a data set for multi-target, multi-camera tracking,” in European Conference on Computer Vision. Amsterdam, The Netherlands: Springer, 2016, pp. 17–35.
- W. Li, R. Zhao, T. Xiao, and X. Wang, “Deepreid: Deep filter pairing neural network for person re-identification,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 152–159.
- L. Wei, S. Zhang, W. Gao, and Q. Tian, “Person transfer gan to bridge domain gap for person re-identification,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 79–88.
- L. Zheng, Z. Bie, Y. Sun, J. Wang, C. Su, S. Wang, and Q. Tian, “Mars: A video benchmark for large-scale person re-identification,” in European Conference on Computer Vision. Amsterdam, The Netherlands: Springer, 2016, pp. 868–884.
- F. Radenović, A. Iscen, G. Tolias, Y. Avrithis, and O. Chum, “Revisiting oxford and paris: Large-scale image retrieval benchmarking,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 5706–5715.
- S. Bai, Z. Zhou, J. Wang, X. Bai, L. Jan Latecki, and Q. Tian, “Ensemble diffusion for retrieval,” in Proceedings of the IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017, pp. 774–783.
- S. Bai, P. Tang, P. H. Torr, and L. J. Latecki, “Re-ranking via metric fusion for object retrieval and person re-identification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, CA, USA: IEEE, 2019, pp. 740–749.
- K. Zhu, H. Guo, Z. Liu, M. Tang, and J. Wang, “Identity-guided human semantic parsing for person re-identification,” in Proceedings of the European conference on computer vision (ECCV). Glasgow, UK: Springer, 2020, pp. 346–363.
- C. Ding, K. Wang, P. Wang, and D. Tao, “Multi-task learning with coarse priors for robust part-aware person re-identification,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 3, pp. 1474–1488, 2020.
- Y. Sun, C. Cheng, Y. Zhang, C. Zhang, L. Zheng, Z. Wang, and Y. Wei, “Circle loss: A unified perspective of pair similarity optimization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA: IEEE, 2020, pp. 6398–6407.
- X. Bai, M. Yang, T. Huang, Z. Dou, R. Yu, and Y. Xu, “Deep-person: Learning discriminative deep features for person re-identification,” Pattern Recognition, vol. 98, p. 107036, 2020.
- K. Zhou, Y. Yang, A. Cavallaro, and T. Xiang, “Learning generalisable omni-scale representations for person re-identification,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 9, pp. 5056–5069, 2022.
- M. Ye, J. Shen, G. Lin, T. Xiang, L. Shao, and S. C. Hoi, “Deep learning for person re-identification: A survey and outlook,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 6, pp. 2872–2893, 2021.
- X. Gong, Z. Yao, X. Li, Y. Fan, B. Luo, J. Fan, and B. Lao, “Lag-net: Multi-granularity network for person re-identification via local attention system,” IEEE Transactions on Multimedia, vol. 24, pp. 217–229, 2021.
- C. Yan, G. Pang, X. Bai, C. Liu, X. Ning, L. Gu, and J. Zhou, “Beyond triplet loss: Person re-identification with fine-grained difference-aware pairwise loss,” IEEE Transactions on Multimedia, vol. 24, pp. 1665–1677, 2021.
- H. Gu, J. Li, G. Fu, C. Wong, X. Chen, and J. Zhu, “Autoloss-gms: Searching generalized margin-based softmax loss function for person re-identification,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 4744–4753.
- G. Wu, X. Zhu, and S. Gong, “Learning hybrid ranking representation for person re-identification,” Pattern Recognition, vol. 121, p. 108239, 2022.
- T. Si, F. He, H. Wu, and Y. Duan, “Spatial-driven features based on image dependencies for person re-identification,” Pattern Recognition, vol. 124, p. 108462, 2022.
- L. Wu, Y. Wang, J. Gao, and X. Li, “Where-and-when to look: Deep siamese attention networks for video-based person re-identification,” IEEE Transactions on Multimedia, vol. 21, no. 6, pp. 1412–1424, 2018.
- A. Porrello, L. Bergamini, and S. Calderara, “Robust re-identification by multiple views knowledge distillation,” in European Conference on Computer Vision. Glasgow, UK: Springer, 2020, pp. 93–110.
- Y. Yan, J. Qin, J. Chen, L. Liu, F. Zhu, Y. Tai, and L. Shao, “Learning multi-granular hypergraphs for video-based person re-identification,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA: IEEE, 2020, pp. 2899–2908.
- J. Zhao, F. Qi, G. Ren, and L. Xu, “Phd learning: Learning with pompeiu-hausdorff distances for video-based vehicle re-identification,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 2225–2235.
- A. Aich, M. Zheng, S. Karanam, T. Chen, A. K. Roy-Chowdhury, and Z. Wu, “Spatio-temporal representation factorization for video-based person re-identification,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 152–162.
- T. He, X. Jin, X. Shen, J. Huang, Z. Chen, and X.-S. Hua, “Dense interaction learning for video-based person re-identification,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 1490–1501.
- X. Zang, G. Li, and W. Gao, “Multidirection and multiscale pyramid in transformer for video-based pedestrian retrieval,” IEEE Transactions on Industrial Informatics, vol. 18, no. 12, pp. 8776–8785, 2022.
- S. Bai, B. Ma, H. Chang, R. Huang, and X. Chen, “Salient-to-broad transition for video person re-identification,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 7339–7348.
- Z. Tang, R. Zhang, Z. Peng, J. Chen, and L. Lin, “Multi-stage spatio-temporal aggregation transformer for video person re-identification,” IEEE Transactions on Multimedia, 2022.
- Y. Yao, X. Jiang, H. Fujita, and Z. Fang, “A sparse graph wavelet convolution neural network for video-based person re-identification,” Pattern Recognition, vol. 129, p. 108708, 2022.
- A. El-Nouby, N. Neverova, I. Laptev, and H. Jégou, “Training vision transformers for image retrieval,” arXiv preprint arXiv:2102.05644, 2021.
- G. Tolias, T. Jenicek, and O. Chum, “Learning and aggregating deep local descriptors for instance-level recognition,” in European Conference on Computer Vision. Springer, 2020, pp. 460–477.
- M. Yang, D. He, M. Fan, B. Shi, X. Xue, F. Li, E. Ding, and J. Huang, “Dolg: Single-stage image retrieval with deep orthogonal fusion of local and global features,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 11 772–11 781.
- Y. Song, R. Zhu, M. Yang, and D. He, “Dalg: Deep attentive local and global modeling for image retrieval,” arXiv preprint arXiv:2207.00287, 2022.
- X. Zhu, H. Wang, P. Liu, Z. Yang, and J. Qian, “Graph-based reasoning attention pooling with curriculum design for content-based image retrieval,” Image and Vision Computing, vol. 115, p. 104289, 2021.
- H. Wu, M. Wang, W. Zhou, and H. Li, “Learning deep local features with multiple dynamic attentions for large-scale image retrieval,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 11 416–11 425.
- H. Wu, M. Wang, W. Zhou, H. Li, and Q. Tian, “Contextual similarity distillation for asymmetric image retrieval,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 9489–9498.
- H. Wu, M. Wang, W. Zhou, Y. Hu, and H. Li, “Learning token-based representation for image retrieval,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 3, 2022, pp. 2703–2711.
- F. Radenović, G. Tolias, and O. Chum, “Fine-tuning cnn image retrieval with no human annotation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 7, pp. 1655–1668, 2018.
- C.-T. Liu, C.-W. Wu, Y.-C. F. Wang, and S.-Y. Chien, “Spatially and temporally efficient non-local attention network for video-based person re-identification,” in British Machine Vision Conference. Cardiff, UK: BMVA Press, 2019, p. 243.
- L. v. d. Maaten and G. Hinton, “Visualizing data using t-sne,” Journal of Machine Learning Research, vol. 9, no. Nov, pp. 2579–2605, 2008.
- Yuqi Zhang (54 papers)
- Qi Qian (54 papers)
- Hongsong Wang (25 papers)
- Chong Liu (104 papers)
- Weihua Chen (35 papers)
- Fan Wang (313 papers)