DISTILLATION OF NEURAL  NETWORK MODELS FOR DETECTION AND DESCRIPTION OF IMAGE KEY POINTS

Artem V. Yashchenko, Anatoly V. Belikov, Maxim V. Peterson, Potapov Alexey Sergeevich

2020 , VOLUME 20, NUMBER 3 ( may-june )

ISSN 2226-1494 (print), ISSN 2500-0373 (online)

Publications

Editor-in-Chief

Nikiforov
Vladimir O.
D.Sc., Prof.

Partners

doi: 10.17586/2226-1494-2020-20-3-402-409

DISTILLATION OF NEURAL NETWORK MODELS FOR DETECTION AND DESCRIPTION OF IMAGE KEY POINTS

A. V. Yashchenko, A. V. Belikov, M. V. Peterson, A. S. Potapov

Read the full article

Article in Russian

For citation:

For citation: Yashchenko A.V., Belikov A.V., Peterson M.V., Potapov A.S. Distillation of neural network models for detection and description of image key points. Scientiﬁc and Technical Journal of Information Technologies, Mechanics and Optics, 2020, vol. 20, no. 3, pp. 402–409 (in Russian). doi: 10.17586/2226-1494-2020-20-3-402-409

Abstract

Subject of Research. Image matching and classiﬁcation methods, as well as synchronous location and mapping, are widely used on embedded and mobile devices. Their most resource-intensive part is the detection and description of the image key points. In case of classical methods for detection and description of key points they can be executed in real time on mobile devices but for modern neural network methods with better quality, such approach is difﬁcult due to trading off performance. Thus, the issue of speeding for neural network models applied for the detection and description of key points is currently topical. The subject of research is distillation as one of the methods for reducing neural network models. The aim of the study is to obtain more compact model for detection and description of key points and a description of the procedure for this model design. Method. We proposed a method for pairing the original and more compact new model for its subsequent training on the output values of the original model. In this regard, the new model is learned to reconstruct the output of the original model without using image labels. Both networks accept identical images as input. Main Results. Neural network distillation method for detection and description of key points is tested. The objective function and training parameters that provide the best results in the framework of the study are proposed. A new data set is created for testing key point detection methods, and a new quality indicator of the allocated key points and their corresponding local features is added. New model training in the way suggested with the same number of parameters, shows greater accuracy in key points compared to the original model. A new model with a signiﬁcantly smaller number of parameters shows the accuracy of point matching close to the accuracy of the original model. Practical Relevance. More compact model for detection and description of image key points is created applying the proposed method. The model is applicable on embedded and mobile devices for synchronous location and mapping. Such model application can also increase the service efﬁciency of the image search on the server side.

Keywords: deep learning, keypoint detection, local image descriptors

References

Bay H., Tuytelaars T., Van Gool L. Surf: Speeded up robust features. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2006, vol. 3951, pp. 404–417. doi: 10.1007/11744023_32
Rublee E., Rabaud V., Konolige K., Bradski G. ORB: An efficient alternative to SIFT or SURF. Proc. of the International Conference on Computer Vision (ICCV 2011), 2011, pp. 2564–2571. doi: 10.1109/ICCV.2011.6126544
DeTone D., Malisiewicz T., Rabinovich A. SuperPoint: Self-supervised interest point detection and description. Proc. 31^st Meeting of the IEEE/CVF IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2018, pp. 337–349. doi: 10.1109/CVPRW.2018.00060
Ono Y., Fua P., Trulls E., Yi K. LF-Net: learning local features from images. Advances in Neural Information Processing Systems, 2018, pp. 6234–6244.
Mikolajczyk K., Schmid C. A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, vol. 27, no. 10, pp. 1615–1630. doi: 10.1109/TPAMI.2005.188
Cao Z., Hidalgo G., Simon T., Wei S., Sheikh Y. OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, Early access. doi: 10.1109/TPAMI.2019.2929257
Baltrušaitis T., Robinson P., Morency L.-P. Openface: an open source facial behavior analysis toolkit. Proc. IEEE Winter Conference on Applications of Computer Vision (WACV), 2016, pp. 7477553. doi: 10.1109/WACV.2016.7477553
Simonyan K., Zisserman A. Very deep convolutional networks for large-scale image recognition. Proc. 3^rd International Conference on Learning Representations (ICLR), 2015.
Iandola F., Moskewicz M., Karayev S., Girshick R., Darrell T., Keutzer K. Densenet: Implementing efficient convnet descriptor pyramids. Available at: https://arxiv.org/abs/1404.1869 (accessed: 17.01.2020).
Brock A., Donahue J., Simonyan K. Large scale gan training for high fidelity natural image synthesis. Proc. 7^th International Conference on Learning Representations (ICLR). 2019.
Redmon J., Divvala S., Girshick R., Farhadi A. You only look once: Unified, real-time object detection. Proc. 29^th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 779–788. doi: 10.1109/CVPR.2016.91
Zheng Z., Yang X., Yu Z., Zheng L., Yang Y., Kautz J. Joint discriminative and generative learning for person re-identification. Proc. 32^nd IEEE /CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 2133–2142. doi: 10.1109/CVPR.2019.00224
Huang Q., Zhou K., You S., Neumann U. Learning to prune filters in convolutional neural networks. Proc. 18^th IEEE Winter Conference on Applications of Computer Vision (WACV), 2018, pp. 709–718. doi: 10.1109/WACV.2018.00083
Gomez A.N., Zhang I., Kamalakara S.R., Madaan D., Swersky K., Gal Y., Hinton G.E. Learning sparse networks using targeted dropout. Available at: https://arxiv.org/abs/1905.13678 (accessed: 18.03.2020).
Wang Y., Xu C., You S., Tao D., Xu C. CNNpack: Packing convolutional neural networks in the frequency domain. Advances in Neural Information Processing Systems, 2016, pp. 253–261.
Hinton G., Vinyals O., Dean J. Distilling the knowledge in a neural network. Available at: https://arxiv.org/abs/1503.02531 (accessed: 06.02.2020).
Wang J., Gou L., Zhang W., Yang H., Shen H.-W. Deepvid: Deep visual interpretation and diagnosis for image classifiers via knowledge distillation. IEEE Transactions on Visualization and Computer Graphics, 2019, vol. 25, no. 6, pp. 2168–2180. doi: 10.1109/TVCG.2019.2903943
Shah S., Dey D., Lovett C., Kapoor A. AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles. Field and Service Robotics. Springer, 2018, pp. 621–635. doi: 10.1007/978-3-319-67361-5_40
Hartley R., Zisserman A. Multiple View Geometry in Computer Vision. Cambridge University Press, 2003, 178 p. doi: 10.1017/CBO9780511811685
Balntas V., Lenc K., Vedaldi A., Mikolajczyk K. HPatches: A benchmark and evaluation of handcrafted and learned local descriptors. Proc. 30^th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 3852–3861. doi: 10.1109/CVPR.2017.410

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License