doi: 10.17586/2226-1494-2020-20-3-402-409


DISTILLATION OF NEURAL  NETWORK MODELS FOR DETECTION AND DESCRIPTION OF IMAGE KEY POINTS

A. V. Yashchenko, A. V. Belikov, M. V. Peterson, A. S. Potapov


Read the full article  ';
Article in Russian

For citation:

For citation: Yashchenko A.V., Belikov A.V., Peterson M.V., Potapov A.S. Distillation of neural network models for detection and description of image key points. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2020, vol. 20, no. 3, pp. 402–409 (in Russian). doi: 10.17586/2226-1494-2020-20-3-402-409



Abstract
Subject of Research. Image matching and classification methods, as well as synchronous location and mapping, are widely used on embedded and mobile devices. Their most resource-intensive part is the detection and description of the image key points. In case of classical methods for detection and description of key points they can be executed in real time on mobile devices but for modern neural network methods with better quality, such approach is difficult due to trading off performance. Thus, the issue of speeding for neural network models applied for the detection and description of key points is currently topical. The subject of research is distillation as one of the methods for reducing neural network models. The aim of the study is to obtain more compact model for detection and description of key points and a description of the procedure for this model design. Method. We proposed a method for pairing the original and more compact new model for its subsequent training on the output values of the original model. In this regard, the new model is learned to reconstruct the output of the original model without using image labels. Both networks accept identical images as input. Main Results. Neural network distillation method for detection and description of key points is tested. The objective function and training parameters that provide the best results in the framework of the study are proposed. A new data set is created for testing key point detection methods, and a new quality indicator of the allocated key points and their corresponding local features is added. New model training in the way suggested with the same number of parameters, shows greater accuracy in key points compared to the original model. A new model with a significantly smaller number of parameters shows the accuracy of point matching close to the accuracy of the original model. Practical Relevance. More compact model for detection and description of image key points is created applying the proposed method. The model is applicable on embedded and mobile devices for synchronous location and mapping. Such model application can also increase the service efficiency of the image search on the server side.

Keywords: deep learning, keypoint detection, local image descriptors

References
  1. Bay H., Tuytelaars T., Van Gool L. Surf: Speeded up robust features. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2006, vol. 3951, pp. 404–417. doi: 10.1007/11744023_32
  2. Rublee E., Rabaud V., Konolige K., Bradski G. ORB: An efficient alternative to SIFT or SURF. Proc. of the International Conference on Computer Vision (ICCV 2011), 2011, pp. 2564–2571. doi: 10.1109/ICCV.2011.6126544
  3. DeTone D., Malisiewicz T., Rabinovich A. SuperPoint: Self-supervised interest point detection and description. Proc. 31st Meeting of the IEEE/CVF IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2018, pp. 337–349. doi: 10.1109/CVPRW.2018.00060
  4. Ono Y., Fua P., Trulls E., Yi K. LF-Net: learning local features from images. Advances in Neural Information Processing Systems, 2018, pp. 6234–6244.
  5. Mikolajczyk K., Schmid C. A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, vol. 27, no. 10, pp. 1615–1630. doi: 10.1109/TPAMI.2005.188
  6. Cao Z., Hidalgo G., Simon T., Wei S., Sheikh Y. OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, Early access. doi: 10.1109/TPAMI.2019.2929257
  7. Baltrušaitis T., Robinson P., Morency L.-P. Openface: an open source facial behavior analysis toolkit. Proc. IEEE Winter Conference on Applications of Computer Vision (WACV), 2016, pp. 7477553. doi: 10.1109/WACV.2016.7477553
  8. Simonyan K., Zisserman A. Very deep convolutional networks for large-scale image recognition. Proc. 3rd International Conference on Learning Representations (ICLR), 2015.
  9. Iandola F., Moskewicz M., Karayev S., Girshick R., Darrell T., Keutzer K. Densenet: Implementing efficient convnet descriptor pyramids. Available at: https://arxiv.org/abs/1404.1869 (accessed: 17.01.2020).
  10. Brock A., Donahue J., Simonyan K. Large scale gan training for high fidelity natural image synthesis. Proc. 7th International Conference on Learning Representations (ICLR). 2019.
  11. Redmon J., Divvala S., Girshick R., Farhadi A. You only look once: Unified, real-time object detection. Proc. 29th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 779–788. doi: 10.1109/CVPR.2016.91
  12. Zheng Z., Yang X., Yu Z., Zheng L., Yang Y., Kautz J. Joint discriminative and generative learning for person re-identification. Proc. 32nd IEEE /CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 2133–2142. doi: 10.1109/CVPR.2019.00224
  13. Huang Q., Zhou K., You S., Neumann U. Learning to prune filters in convolutional neural networks. Proc. 18th IEEE Winter Conference on Applications of Computer Vision (WACV), 2018, pp. 709–718. doi: 10.1109/WACV.2018.00083
  14. Gomez A.N., Zhang I., Kamalakara S.R., Madaan D., Swersky K., Gal Y., Hinton G.E. Learning sparse networks using targeted dropout. Available at: https://arxiv.org/abs/1905.13678 (accessed: 18.03.2020).
  15. Wang Y., Xu C., You S., Tao D., Xu C. CNNpack: Packing convolutional neural networks in the frequency domain. Advances in Neural Information Processing Systems, 2016, pp. 253–261.
  16. Hinton G., Vinyals O., Dean J. Distilling the knowledge in a neural network. Available at: https://arxiv.org/abs/1503.02531 (accessed: 06.02.2020).
  17. Wang J., Gou L., Zhang W., Yang H., Shen H.-W. Deepvid: Deep visual interpretation and diagnosis for image classifiers via knowledge distillation. IEEE Transactions on Visualization and Computer Graphics, 2019, vol. 25, no. 6, pp. 2168–2180. doi: 10.1109/TVCG.2019.2903943
  18. Shah S., Dey D., Lovett C., Kapoor A. AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles. Field and Service Robotics. Springer, 2018, pp. 621–635. doi: 10.1007/978-3-319-67361-5_40
  19. Hartley R., Zisserman A. Multiple View Geometry in Computer Vision. Cambridge University Press, 2003, 178 p. doi: 10.1017/CBO9780511811685
  20. Balntas V., Lenc K., Vedaldi A., Mikolajczyk K. HPatches: A benchmark and evaluation of handcrafted and learned local descriptors. Proc. 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 3852–3861. doi: 10.1109/CVPR.2017.410


Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License
Copyright 2001-2025 ©
Scientific and Technical Journal
of Information Technologies, Mechanics and Optics.
All rights reserved.

Яндекс.Метрика