DRKF: Distilled Rotated Kernel Fusion for Efficient Rotation Invariant Descriptors in Local Feature Matching
Abstract: The performance of local feature descriptors degrades in the presence of large rotation variations. To address this issue, we present an efficient approach to learning rotation invariant descriptors. Specifically, we propose Rotated Kernel Fusion (RKF) which imposes rotations on the convolution kernel to improve the inherent nature of CNN. Since RKF can be processed by the subsequent re-parameterization, no extra computational costs will be introduced in the inference stage. Moreover, we present Multi-oriented Feature Aggregation (MOFA) which aggregates features extracted from multiple rotated versions of the input image and can provide auxiliary knowledge for the training of RKF by leveraging the distillation strategy. We refer to the distilled RKF model as DRKF. Besides the evaluation on a rotation-augmented version of the public dataset HPatches, we also contribute a new dataset named DiverseBEV which is collected during the drone's flight and consists of bird's eye view images with large viewpoint changes and camera rotations. Extensive experiments show that our method can outperform other state-of-the-art techniques when exposed to large rotation variations.
- J. L. Schonberger and J.-M. Frahm, “Structure-from-motion revisited,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 4104–4113, 2016.
- R. Mur-Artal, J. M. M. Montiel, and J. D. Tardos, “Orb-slam: a versatile and accurate monocular slam system,” IEEE Trans. Robot., vol. 31, no. 5, pp. 1147–1163, 2015.
- D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. J. Comput. Vis. (IJCV), vol. 60, no. 2, pp. 91–110, 2004.
- H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool, “Speeded-up robust features (surf),” Comput. Vis. Image. Und., vol. 110, no. 3, pp. 346–359, 2008.
- E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “Orb: An efficient alternative to sift or surf,” in Proc. IEEE Int. Conf. Comput. Vision. (ICCV), pp. 2564–2571, Ieee, 2011.
- K. M. Yi, E. Trulls, V. Lepetit, and P. Fua, “Lift: Learned invariant feature transform,” in Proc. Eur. Conf. Comput. Vis. (ECCV), pp. 467–483, Springer, 2016.
- Y. Ono, E. Trulls, P. Fua, and K. M. Yi, “Lf-net: Learning local features from images,” in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 31, 2018.
- X. Shen, C. Wang, X. Li, Z. Yu, J. Li, C. Wen, M. Cheng, and Z. He, “Rf-net: An end-to-end image matching network based on receptive field,” in Proc. Eur. Conf. Comput. Vis. (ECCV), pp. 8132–8140, 2019.
- D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superpoint: Self-supervised interest point detection and description,” in Proc. IEEE Int. Conf. Comput. Vision. (ICCV) Workshop, pp. 224–236, 2018.
- M. Dusmanu, I. Rocco, T. Pajdla, M. Pollefeys, J. Sivic, A. Torii, and T. Sattler, “D2-net: A trainable cnn for joint description and detection of local features,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 8092–8101, 2019.
- J. Revaud, C. De Souza, M. Humenberger, and P. Weinzaepfel, “R2d2: Reliable and repeatable detector and descriptor,” in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 32, 2019.
- Z. Luo, L. Zhou, X. Bai, H. Chen, J. Zhang, Y. Yao, S. Li, T. Fang, and L. Quan, “Aslfeat: Learning local features of accurate shape and localization,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 6589–6598, 2020.
- T. Cohen and M. Welling, “Group equivariant convolutional networks,” in International conference on machine learning, pp. 2990–2999, PMLR, 2016.
- T. S. Cohen and M. Welling, “Steerable cnns,” arXiv preprint arXiv:1612.08498, 2016.
- G. Bökman and F. Kahl, “A case for using rotation invariant features in state of the art feature matchers,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5110–5119, 2022.
- U. S. Parihar, A. Gujarathi, K. Mehta, S. Tourani, S. Garg, M. Milford, and K. M. Krishna, “Rord: Rotation-robust descriptors and orthographic views for local feature matching,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1593–1600, IEEE, 2021.
- Y. Liu, Z. Shen, Z. Lin, S. Peng, H. Bao, and X. Zhou, “Gift: Learning transformation-invariant dense visual descriptors via group cnns,” Advances in Neural Information Processing Systems, vol. 32, 2019.
- V. Balntas, K. Lenc, A. Vedaldi, and K. Mikolajczyk, “Hpatches: A benchmark and evaluation of handcrafted and learned local descriptors,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 5173–5182, 2017.
- E. Rosten and T. Drummond, “Machine learning for high-speed corner detection,” in Proc. Eur. Conf. Comput. Vis. (ECCV), pp. 430–443, Springer, 2006.
- M. Jaderberg, K. Simonyan, A. Zisserman, et al., “Spatial transformer networks,” in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 28, 2015.
- P.-E. Sarlin, D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superglue: Learning feature matching with graph neural networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 4938–4947, 2020.
- J. Sun, Z. Shen, Y. Wang, H. Bao, and X. Zhou, “Loftr: Detector-free local feature matching with transformers,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 8922–8931, 2021.
- M. Weiler and G. Cesa, “General e (2)-equivariant steerable cnns,” Advances in Neural Information Processing Systems, vol. 32, 2019.
- A. Peri, K. Mehta, A. Mishra, M. Milford, S. Garg, and K. M. Krishna, “Ref–rotation equivariant features for local feature matching,” arXiv preprint arXiv:2203.05206, 2022.
- C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 1–9, 2015.
- X. Ding, X. Zhang, N. Ma, J. Han, G. Ding, and J. Sun, “Repvgg: Making vgg-style convnets great again,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 13733–13742, 2021.
- X. Ding, X. Zhang, J. Han, and G. Ding, “Scaling up your kernels to 31x31: Revisiting large kernel design in cnns,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 11963–11975, 2022.
- X. Ding, Y. Guo, G. Ding, and J. Han, “Acnet: Strengthening the kernel skeletons for powerful cnn via asymmetric convolution blocks,” in Proc. IEEE Int. Conf. Comput. Vision. (ICCV), pp. 1911–1920, 2019.
- G. Hinton, O. Vinyals, J. Dean, et al., “Distilling the knowledge in a neural network,” in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS) Workshop, 2014.
- Z. Li and D. Hoiem, “Learning without forgetting,” IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI), vol. 40, no. 12, pp. 2935–2947, 2017.
- Z. Peng, Z. Li, J. Zhang, Y. Li, G.-J. Qi, and J. Tang, “Few-shot image recognition with knowledge transfer,” in Proc. IEEE Int. Conf. Comput. Vision. (ICCV), pp. 441–449, 2019.
- G. Chen, W. Choi, X. Yu, T. Han, and M. Chandraker, “Learning efficient object detection models with knowledge distillation,” in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 30, 2017.
- Q. Li, S. Jin, and J. Yan, “Mimicking very efficient network for object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 6356–6364, 2017.
- T. He, C. Shen, Z. Tian, D. Gong, C. Sun, and Y. Yan, “Knowledge adaptation for efficient semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 578–587, 2019.
- Q. Dou, Q. Liu, P. A. Heng, and B. Glocker, “Unpaired multi-modal segmentation via knowledge distillation,” IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI), vol. 39, no. 7, pp. 2415–2425, 2020.
- F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: A unified embedding for face recognition and clustering,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2015.
- R. Hadsell, S. Chopra, and Y. Lecun, “Dimensionality reduction by learning an invariant mapping,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2006.
- W. Shi, J. Caballero, F. Huszár, J. Totz, and Z. Wang, “Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016.
- T. Shen, Z. Luo, L. Zhou, R. Zhang, S. Zhu, T. Fang, and L. Quan, “Matchable image retrieval by learning from surface reconstruction,” in Proc. Asian Conf. Comput. Vis. (ACCV), 2018.
- J. L. Schönberger, E. Zheng, J.-M. Frahm, and M. Pollefeys, “Pixelwise view selection for unstructured multi-view stereo,” in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part III 14, pp. 501–518, Springer, 2016.
- K. Mikolajczyk and C. Schmid, “A performance evaluation of local descriptors,” IEEE transactions on pattern analysis and machine intelligence, vol. 27, no. 10, pp. 1615–1630, 2005.
- Y. Jin, D. Mishkin, A. Mishchuk, J. Matas, P. Fua, K. M. Yi, and E. Trulls, “Image matching across wide baselines: From paper to practice,” International Journal of Computer Vision, vol. 129, no. 2, pp. 517–547, 2021.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.