Learnable Graph Matching: A Practical Paradigm for Data Association (2303.15414v2)
Abstract: Data association is at the core of many computer vision tasks, e.g., multiple object tracking, image matching, and point cloud registration. however, current data association solutions have some defects: they mostly ignore the intra-view context information; besides, they either train deep association models in an end-to-end way and hardly utilize the advantage of optimization-based assignment methods, or only use an off-the-shelf neural network to extract features. In this paper, we propose a general learnable graph matching method to address these issues. Especially, we model the intra-view relationships as an undirected graph. Then data association turns into a general graph matching problem between graphs. Furthermore, to make optimization end-to-end differentiable, we relax the original graph matching problem into continuous quadratic programming and then incorporate training into a deep graph neural network with KKT conditions and implicit function theorem. In MOT task, our method achieves state-of-the-art performance on several MOT datasets. For image matching, our method outperforms state-of-the-art methods on a popular indoor dataset, ScanNet. For point cloud registration, we also achieve competitive results. Code will be available at https://github.com/jiaweihe1996/GMTracker.
- H. W. Kuhn, “The Hungarian method for the assignment problem,” Naval research logistics quarterly, vol. 2, no. 1-2, pp. 83–97, 1955.
- E. L. Lawler, “The quadratic assignment problem,” Management Science, vol. 9, no. 4, pp. 586–599, 1963.
- T. C. Koopmans and M. Beckmann, “Assignment problems and the location of economic activities,” Econometrica, vol. 25, no. 1, pp. 53–76, 1957.
- B. Amos and J. Z. Kolter, “OptNet: Differentiable optimization as a layer in neural networks,” in ICML, 2017.
- T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in IEEE Conf. Comput. Vis. Pattern Recog., 2017.
- T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” in Int. Conf. Comput. Vis., 2017.
- J. Berclaz, F. Fleuret, E. Turetken, and P. Fua, “Multiple object tracking using k-shortest paths optimization,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 9, pp. 1806–1819, 2011.
- A. Bewley, Z. Ge, L. Ott, F. Ramos, and B. Upcroft, “Simple online and realtime tracking,” in IEEE Int. Conf. Image Process., 2016.
- S. Wang and C. C. Fowlkes, “Learning optimal parameters for multi-target tracking with contextual interactions,” Int. J. Comput. Vis., vol. 122, no. 3, pp. 484–501, 2017.
- G. Brasó and L. Leal-Taixé, “Learning a neural solver for multiple object tracking,” in IEEE Conf. Comput. Vis. Pattern Recog., 2020.
- Y. Xu, A. Osep, Y. Ban, R. Horaud, L. Leal-Taixé, and X. Alameda-Pineda, “How to train your deep multi-object tracker,” in IEEE Conf. Comput. Vis. Pattern Recog., 2020.
- A. Hornakova, R. Henschel, B. Rosenhahn, and P. Swoboda, “Lifted disjoint paths with application in multiple object tracking,” in ICML, 2020.
- C.-H. Kuo and R. Nevatia, “How does person identity recognition help multi-person tracking?” in IEEE Conf. Comput. Vis. Pattern Recog., 2011.
- B. Yang and R. Nevatia, “An online learned CRF model for multi-target tracking,” in IEEE Conf. Comput. Vis. Pattern Recog., 2012.
- L. Leal-Taixé, C. Canton-Ferrer, and K. Schindler, “Learning by tracking: Siamese CNN for robust target association,” in IEEE Conf. Comput. Vis. Pattern Recog. Worksh., 2016.
- N. Wojke, A. Bewley, and D. Paulus, “Simple online and realtime tracking with a deep association metric,” in IEEE Int. Conf. Image Process., 2017.
- J. Zhu, H. Yang, N. Liu, M. Kim, W. Zhang, and M.-H. Yang, “Online multi-object tracking with dual matching attention networks,” in Eur. Conf. Comput. Vis., 2018.
- S. Sun, N. Akhtar, H. Song, A. S. Mian, and M. Shah, “Deep affinity network for multiple object tracking,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 1, pp. 104–119, 2019.
- P.-E. Sarlin, D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superglue: Learning feature matching with graph neural networks,” in IEEE Conf. Comput. Vis. Pattern Recog., 2020.
- D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, 2004.
- D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superpoint: Self-supervised interest point detection and description,” in IEEE Conf. Comput. Vis. Pattern Recog. Worksh., 2018.
- A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Nießner, “Scannet: Richly-annotated 3d reconstructions of indoor scenes,” in IEEE Conf. Comput. Vis. Pattern Recog., 2017.
- Y. Bar-Shalom, T. E. Fortmann, and P. G. Cable, “Tracking and data association,” 1990.
- D. Reid, “An algorithm for tracking multiple targets,” IEEE Transactions on Automatic Control, vol. 24, no. 6, pp. 843–854, 1979.
- S. Hamid Rezatofighi, A. Milan, Z. Zhang, Q. Shi, A. Dick, and I. Reid, “Joint probabilistic data association revisited,” in Int. Conf. Comput. Vis., 2015.
- C. Kim, F. Li, A. Ciptadi, and J. M. Rehg, “Multiple hypothesis tracking revisited,” in Int. Conf. Comput. Vis., 2015.
- F. Fleuret, J. Berclaz, R. Lengagne, and P. Fua, “Multicamera people tracking with a probabilistic occupancy map,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 2, pp. 267–282, 2007.
- L. Zhang, Y. Li, and R. Nevatia, “Global data association for multi-object tracking using network flows,” in IEEE Conf. Comput. Vis. Pattern Recog., 2008.
- B. Yang, C. Huang, and R. Nevatia, “Learning affinities and dependencies for multi-target tracking using a CRF model,” in IEEE Conf. Comput. Vis. Pattern Recog., 2011.
- A. R. Zamir, A. Dehghan, and M. Shah, “GMCP-Tracker: Global multi-object tracking using generalized minimum clique graphs,” in Eur. Conf. Comput. Vis., 2012.
- S. Tang, B. Andres, M. Andriluka, and B. Schiele, “Subgraph decomposition for multi-target tracking,” in IEEE Conf. Comput. Vis. Pattern Recog., 2015.
- H. Pirsiavash, D. Ramanan, and C. C. Fowlkes, “Globally-optimal greedy algorithms for tracking a variable number of objects,” in IEEE Conf. Comput. Vis. Pattern Recog., 2011.
- S. Tang, B. Andres, M. Andriluka, and B. Schiele, “Multi-person tracking by multicut and deep matching,” in Eur. Conf. Comput. Vis., 2016.
- W. Choi, “Near-online multi-target tracking with aggregated local flow descriptor,” in Int. Conf. Comput. Vis., 2015.
- B. Wang, G. Wang, K. L. Chan, and L. Wang, “Tracklet association by online target-specific metric learning and coherent dynamics estimation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 3, pp. 589–602, 2016.
- A. Sadeghian, A. Alahi, and S. Savarese, “Tracking the untrackable: Learning to track multiple cues with long-term dependencies,” in Int. Conf. Comput. Vis., 2017.
- J. Li, X. Gao, and T. Jiang, “Graph networks for multiple object tracking,” in WACV, 2020, pp. 719–728.
- X. Jiang, P. Li, Y. Li, and X. Zhen, “Graph neural based end-to-end data association framework for online multiple-object tracking,” arXiv preprint arXiv:1907.05315, 2019.
- S. Li, Y. Kong, and H. Rezatofighi, “Learning of global objective for network flow in multi-object tracking,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 8855–8865.
- W. Hu, X. Shi, Z. Zhou, J. Xing, H. Ling, and S. Maybank, “Dual L1-normalized context aware tensor power iteration and its applications to multi-object tracking and multi-graph matching,” Int. J. Comput. Vis., vol. 128, no. 2, pp. 360–392, 2020.
- H. Bay, T. Tuytelaars, and L. Van Gool, “SURF: Speeded up robust features,” in Eur. Conf. Comput. Vis., 2006.
- E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “ORB: An efficient alternative to sift or surf,” in Int. Conf. Comput. Vis., 2011.
- T. Tuytelaars and L. Van Gool, “Wide baseline stereo matching based on local, affinely invariant regions,” in Brit. Mach. Vis. Conf. Citeseer, 2000.
- T. Sattler, B. Leibe, and L. Kobbelt, “Scramsac: Improving ransac’s efficiency with a spatial consistency filter,” in Int. Conf. Comput. Vis., 2009.
- M. Dusmanu, I. Rocco, T. Pajdla, M. Pollefeys, J. Sivic, A. Torii, and T. Sattler, “D2-net: A trainable cnn for joint detection and description of local features,” in IEEE Conf. Comput. Vis. Pattern Recog., 2019.
- J. Sun, Z. Shen, Y. Wang, H. Bao, and X. Zhou, “LoFTR: Detector-free local feature matching with transformers,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 8922–8931.
- H. Deng, T. Birdal, and S. Ilic, “Ppfnet: Global context aware local features for robust 3d point matching,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 195–205.
- C. Choy, J. Park, and V. Koltun, “Fully convolutional geometric features,” in Int. Conf. Comput. Vis., 2019, pp. 8958–8966.
- Z. Qin, H. Yu, C. Wang, Y. Guo, Y. Peng, and K. Xu, “Geometric transformer for fast and robust point cloud registration,” in IEEE Conf. Comput. Vis. Pattern Recog., 2022.
- Y. Wang and J. M. Solomon, “Prnet: Self-supervised learning for partial-to-partial registration,” Advances in neural information processing systems, vol. 32, 2019.
- X. Huang, G. Mei, and J. Zhang, “Feature-metric registration: A fast semi-supervised approach for robust point cloud registration without correspondences,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11 366–11 374.
- K. Fu, S. Liu, X. Luo, and M. Wang, “Robust point cloud registration framework based on deep graph matching,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 8893–8902.
- M. Vento and P. Foggia, “Graph matching techniques for computer vision,” in Image Processing: Concepts, Methodologies, Tools, and Applications, 2013, pp. 381–421.
- M. Leordeanu and M. Hebert, “A spectral technique for correspondence problems using pairwise constraints,” in Int. Conf. Comput. Vis., 2005.
- C. Schellewald and C. Schnörr, “Probabilistic subgraph matching based on convex relaxation,” in IEEE Conf. Comput. Vis. Pattern Recog. Worksh., 2005.
- P. H. Torr, “Solving markov random fields using semi definite programming.” in AISTATS, 2003.
- P. Swoboda, C. Rother, H. Abu Alhaija, D. Kainmuller, and B. Savchynskyy, “A study of Lagrangean decompositions and dual ascent solvers for graph matching,” in IEEE Conf. Comput. Vis. Pattern Recog., 2017.
- F. Zhou and F. De la Torre, “Factorized graph matching,” in IEEE Conf. Comput. Vis. Pattern Recog., 2012.
- R. Wang, J. Yan, and X. Yang, “Learning combinatorial embedding networks for deep graph matching,” in Int. Conf. Comput. Vis., 2019.
- T. Yu, R. Wang, J. Yan, and B. Li, “Learning deep graph matching with channel-independent embedding and Hungarian attention,” in Int. Conf. Learn. Represent., 2020.
- S. Barratt, “On the differentiability of the solution to convex optimization problems,” arXiv preprint arXiv:1804.05098, 2018.
- Y. Aflalo, A. Bronstein, and R. Kimmel, “On convex relaxation of graph isomorphism,” Proceedings of the National Academy of Sciences, vol. 112, no. 10, pp. 2942–2947, 2015.
- A. Zanfir and C. Sminchisescu, “Deep learning of graph matching,” in IEEE Conf. Comput. Vis. Pattern Recog., 2018.
- C. Ma, Y. Li, F. Yang, Z. Zhang, Y. Zhuang, H. Jia, and X. Xie, “Deep association: End-to-end graph-based learning for multiple object tracking with conv-graph neural network,” in ICMR, 2019.
- X. Weng, Y. Wang, Y. Man, and K. Kitani, “GNN3DMOT: Graph neural network for 3D multi-object tracking with 2D-3D multi-feature learning,” IEEE Conf. Comput. Vis. Pattern Recog., 2020.
- R. E. Kalman, “A new approach to linear filtering and prediction problems,” ASME Journal of Basic Engineering, vol. 82, no. 1, pp. 35–45, 1960.
- G. Li, C. Xiong, A. Thabet, and B. Ghanem, “DeeperGCN: All you need to train deeper GCNs,” arXiv preprint arXiv:2006.07739, 2020.
- S. Diamond and S. Boyd, “CVXPY: A python-embedded modeling language for convex optimization,” The Journal of Machine Learning Research, vol. 17, no. 1, pp. 2909–2913, 2016.
- G. D. Evangelidis and E. Z. Psarakis, “Parametric image alignment using enhanced correlation coefficient maximization,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 10, pp. 1858–1865, 2008.
- A. Milan, L. Leal-Taixé, I. Reid, S. Roth, and K. Schindler, “MOT16: A benchmark for multi-object tracking,” arXiv preprint arXiv:1603.00831, 2016.
- R. Kasturi, D. Goldgof, P. Soundararajan, V. Manohar, J. Garofolo, R. Bowers, M. Boonstra, V. Korzhova, and J. Zhang, “Framework for performance evaluation of face, text, and vehicle detection and tracking in video: Data, metrics, and protocol,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 2, pp. 319–336, 2008.
- E. Ristani, F. Solera, R. S. Zou, R. Cucchiara, and C. Tomasi, “Performance measures and a data set for multi-target, multi-camera tracking,” in Eur. Conf. Comput. Vis. Worksh., 2016.
- J. Luiten, A. Osep, P. Dendorfer, P. H. S. Torr, A. Geiger, L. Leal-Taixé, and B. Leibe, “HOTA: A higher order metric for evaluating multi-object tracking,” Int. J. Comput. Vis., vol. 129, no. 2, pp. 548–578, 2021.
- P. Bergmann, T. Meinhardt, and L. Leal-Taixe, “Tracking without bells and whistles,” in Int. Conf. Comput. Vis., 2019.
- I. Papakis, A. Sarkar, and A. Karpatne, “GCNNMatch: Graph convolutional neural networks for multi-object tracking via sinkhorn normalization,” arXiv preprint arXiv:2010.00067, 2020.
- Q. Liu, Q. Chu, B. Liu, and N. Yu, “GSM: Graph similarity model for multi-object tracking,” in IJCAI, 2020.
- X. Zhou, V. Koltun, and P. Krähenbühl, “Tracking objects as points,” in Eur. Conf. Comput. Vis., 2020.
- C. Kim, L. Fuxin, M. Alotaibi, and J. M. Rehg, “Discriminative appearance modeling with multi-track pooling for real-time multi-object tracking,” in IEEE Conf. Comput. Vis. Pattern Recog., 2021.
- S. Guo, J. Wang, X. Wang, and D. Tao, “Online multiple object tracking with cross-task synergy,” in IEEE Conf. Comput. Vis. Pattern Recog., 2021.
- F. Saleh, S. Aliakbarian, H. Rezatofighi, M. Salzmann, and S. Gould, “Probabilistic tracklet scoring and inpainting for multiple object tracking,” in IEEE Conf. Comput. Vis. Pattern Recog., 2021.
- P. Dai, R. Weng, W. Choi, C. Zhang, Z. He, and W. Ding, “Learning a proposal classifier for multiple object tracking,” in IEEE Conf. Comput. Vis. Pattern Recog., 2021.
- A. Hornakova, T. Kaiser, P. Swoboda, M. Rolinek, B. Rosenhahn, and R. Henschel, “Making higher order mot scalable: An efficient approximate solver for lifted disjoint paths,” in Int. Conf. Comput. Vis., 2021.
- B. Pang, Y. Li, Y. Zhang, M. Li, and C. Lu, “Tubetk: Adopting tubes to track multi-object in a one-step training model,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 6308–6318.
- F. Zeng, B. Dong, T. Wang, C. Chen, X. Zhang, and Y. Wei, “Motr: End-to-end multiple-object tracking with transformer,” arXiv preprint arXiv:2105.03247, 2021.
- J. Peng, C. Wang, F. Wan, Y. Wu, Y. Wang, Y. Tai, C. Wang, J. Li, F. Huang, and Y. Fu, “Chained-tracker: Chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking,” in European Conference on Computer Vision. Springer, 2020, pp. 145–161.
- J. Pang, L. Qiu, X. Li, H. Chen, Q. Li, T. Darrell, and F. Yu, “Quasi-dense similarity learning for multiple object tracking,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 164–173.
- J. Wu, J. Cao, L. Song, Y. Wang, M. Yang, and J. Yuan, “Track to detect and segment: An online multi-object tracker,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 12 352–12 361.
- S. Han, P. Huang, H. Wang, E. Yu, D. Liu, X. Pan, and J. Zhao, “Mat: Motion-aware multi-object tracking,” arXiv preprint arXiv:2009.04794, 2020.
- L. Zheng, M. Tang, Y. Chen, G. Zhu, J. Wang, and H. Lu, “Improving multiple object tracking with single object tracking,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 2453–2462.
- Y. Xu, Y. Ban, G. Delorme, C. Gan, D. Rus, and X. Alameda-Pineda, “Transcenter: Transformers with dense queries for multiple-object tracking,” arXiv preprint arXiv:2103.15145, 2021.
- Y. Wang, K. Kitani, and X. Weng, “Joint object detection and multi-object tracking with graph neural networks,” arXiv preprint arXiv:2006.13164, 2020.
- W. Li, Y. Xiong, S. Yang, M. Xu, Y. Wang, and W. Xia, “Semi-tcl: Semi-supervised track contrastive representation learning,” arXiv preprint arXiv:2107.02396, 2021.
- Y. Zhang, C. Wang, X. Wang, W. Zeng, and W. Liu, “Fairmot: On the fairness of detection and re-identification in multiple object tracking,” arXiv preprint arXiv:2004.01888, 2020.
- E. Yu, Z. Li, S. Han, and H. Wang, “Relationtrack: Relation-aware multiple object tracking with decoupled representation,” arXiv preprint arXiv:2105.04322, 2021.
- P. Tokmakov, J. Li, W. Burgard, and A. Gaidon, “Learning to track with object permanence,” arXiv preprint arXiv:2103.14258, 2021.
- C. Liang, Z. Zhang, Y. Lu, X. Zhou, B. Li, X. Ye, and J. Zou, “Rethinking the competition between detection and reid in multi-object tracking,” arXiv preprint arXiv:2010.12138, 2020.
- P. Sun, Y. Jiang, R. Zhang, E. Xie, J. Cao, X. Hu, T. Kong, Z. Yuan, C. Wang, and P. Luo, “Transtrack: Multiple-object tracking with transformer,” arXiv preprint arXiv:2012.15460, 2020.
- C. Shan, C. Wei, B. Deng, J. Huang, X.-S. Hua, X. Cheng, and K. Liang, “Tracklets predicting based adaptive graph tracking,” arXiv preprint arXiv:2010.09015, 2020.
- C. Liang, Z. Zhang, X. Zhou, B. Li, Y. Lu, and W. Hu, “One more check: Making” fake background” be tracked again,” arXiv preprint arXiv:2104.09441, 2021.
- Q. Wang, Y. Zheng, P. Pan, and Y. Xu, “Multiple object tracking with correlation learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 3876–3886.
- P. Chu, J. Wang, Q. You, H. Ling, and Z. Liu, “Transmot: Spatial-temporal graph transformer for multiple object tracking,” arXiv preprint arXiv:2104.00194, 2021.
- F. Yang, X. Chang, S. Sakti, Y. Wu, and S. Nakamura, “Remot: A model-agnostic refinement for multiple object tracking,” Image and Vision Computing, vol. 106, p. 104091, 2021.
- Y. Zhang, P. Sun, Y. Jiang, D. Yu, F. Weng, Z. Yuan, P. Luo, W. Liu, and X. Wang, “Bytetrack: Multi-object tracking by associating every detection box,” in European Conference on Computer Vision. Springer, 2022, pp. 1–21.
- Y. Zhang, H. Sheng, Y. Wu, S. Wang, W. Ke, and Z. Xiong, “Multiplex labeling graph for near-online tracking in crowded scenes,” IEEE Internet of Things Journal, vol. 7, no. 9, pp. 7892–7902, 2020.
- P. Dendorfer, H. Rezatofighi, A. Milan, J. Shi, D. Cremers, I. Reid, S. Roth, K. Schindler, and L. Leal-Taixé, “MOT20: A benchmark for multi object tracking in crowded scenes,” arXiv preprint arXiv:2003.09003, 2020.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE Conf. Comput. Vis. Pattern Recog., 2016.
- L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang, and Q. Tian, “Scalable person re-identification: A benchmark,” in Int. Conf. Comput. Vis., 2015.
- W. Li, R. Zhao, T. Xiao, and X. Wang, “DeepReID: Deep filter pairing neural network for person re-identification,” in IEEE Conf. Comput. Vis. Pattern Recog., 2014.
- A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., “Pytorch: An imperative style, high-performance deep learning library,” in Adv. Neural Inform. Process. Syst., 2019.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in Int. Conf. Learn. Represent., 2014.
- Z. Ge, S. Liu, F. Wang, Z. Li, and J. Sun, “Yolox: Exceeding yolo series in 2021,” arXiv preprint arXiv:2107.08430, 2021.
- J. Bian, W.-Y. Lin, Y. Matsushita, S.-K. Yeung, T.-D. Nguyen, and M.-M. Cheng, “GMS: Grid-based motion statistics for fast, ultra-robust feature correspondence,” in IEEE Conf. Comput. Vis. Pattern Recog., 2017.
- K. M. Yi, E. Trulls, Y. Ono, V. Lepetit, M. Salzmann, and P. Fua, “Learning to find good correspondences,” in IEEE Conf. Comput. Vis. Pattern Recog., 2018.
- J. Zhang, D. Sun, Z. Luo, A. Yao, L. Zhou, T. Shen, Y. Chen, L. Quan, and H. Liao, “Learning two-view correspondences and geometry using order-aware network,” in Int. Conf. Comput. Vis., 2019.
- S. Tang, J. Zhang, S. Zhu, and P. Tan, “Quadtree attention for vision transformers,” in Int. Conf. Learn. Represent., 2021.
- A. Zeng, S. Song, M. Nießner, M. Fisher, J. Xiao, and T. Funkhouser, “3dmatch: Learning local geometric descriptors from rgb-d reconstructions,” in IEEE Conf. Comput. Vis. Pattern Recog., 2017.