Menu
Publications
2024
2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
Editor-in-Chief
Nikiforov
Vladimir O.
D.Sc., Prof.
Partners
doi: 10.17586/2226-1494-2020-20-4-525-531
AUTOMATED HANDDETECTION METHOD FOR TASKS OF GESTURE RECOGNITION IN HUMAN-MACHINE INTERFACES
Read the full article ';
Article in Russian
For citation:
Abstract
For citation:
Ryumin D. Automated hand detection method for tasks of gesture recognition in human-machine interfaces. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2020, vol. 20, no. 4, pp. 525–531 (in Russian).
doi: 10.17586/2226-1494-2020-20-4-525-531
Abstract
Subject of Research. The paper presents a solution for automatic analysis and recognition of human hand gestures. Recognition of the elements of sign languages is a topical task in the modern information world. The problem of gesture recognition efficiency has not been resolved due to the presence of cultural diversities in the world sign languages, the differences in the conditions for showing gestures. The problem becomes more complicated by the small size of fingers. Method. The presented method is based on the analysis of frame sequences of a video stream obtained using an optical camera. For processing of the obtained video sequences, it is proposed to use a depth map and a combination of modern classifiers based on Single Shot MultiBox Detector deep neural network architectures with a reduced network model of ResNet-10, NASNetMobile and LSTM type. Main Results. Experiments on automatic video analysis of hand movements and gesture recognition in real time show great potential of the proposed method for human-machine interaction tasks. The recognition accuracy of 48 one-handed gestures based on TheRuSLan database is 79 %. This result is better as compared to the other approaches to solving this problem. Practical Relevance. The results can be used in automatic systems for recognition of sign languages, as well as in the situations where contactless interaction of various user groups is necessary, for example, people with hearing and vision impairments, mobile information robots through automatic recognition of sign information.
Keywords: hand movement video analysis, depth map, gesture recognition, face detection, deep neural networks
References
References
1. Karpov A.A., Yusupov R.M. Multimodal interfaces of human–computer interaction. Herald of the Russian Academy of Sciences, 2018, vol. 88, no. 1, pp. 67–74. doi: 10.1134/S1019331618010094
2. Kagirov I., Karpov A., Kipyatkova I., Klyuzhev K., Kudryavcev A., Kudryavcev I., Ryumin D. Lower limbs exoskeleton control system based on intelligent human-machine interface. Studies in Computational Intelligence, 2020, vol. 868, pp. 457–466. doi: 10.1007/978-3-030-32258-8_54
3. Parker L.E., Rus D., Sukhatme G.S. Multiple mobile robot systems. Springer Handbook of Robotics. Springer, Cham, 2016, pp. 1335–1384. doi: 10.1007/978-3-319-32552-1_53
4. Ryumin D., Kagirov I., Ivanko D., Axyonov A., Karpov A.A. Automatic detection and recognition of 3D manual gestures for human-machine interaction. International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences - ISPRS Archives, 2019, vol. 42, no. 2/W12, pp. 179–183. doi: 10.5194/isprs-archives-XLII-2-W12-179-2019
5. Mahmud S., Lin X., Kim J.H. Interface for Human Machine Interaction for assistant devices: A Review. Proc. 10th Annual Computing and Communication Workshop and Conference (CCWC), 2020, pp. 768–773. doi: 10.1109/CCWC47524.2020.9031244
6. Ivanko D., Ryumin D., Kipyatkova I., Axyonov A., Karpov A. Lip-reading using pixel-based and geometry-based features for multimodal human-robot interfaces. Smart Innovation, Systems and Technologies, 2020, vol. 154, pp. 477–486. doi: 10.1007/978-981-13-9267-2_39
7. Janssen C.P., Donker S.F., Brumby D.P., Kun A.L. History and future of human-automation interaction. International Journal of Human Computer Studies, 2019, vol. 131, pp. 99–107. doi: 10.1016/j.ijhcs.2019.05.006
8. Prostejovsky A.M., Brosinsky C., Heussen K., Westermann D., Kreusel J., Marinelli M. The future role of human operators in highly automated electric power systems. Electric Power Systems Research, 2019, vol. 175, pp. 105883. doi: 10.1016/j.epsr.2019.105883
9. Chakraborty B.K., Sarma D., Bhuyan M.K., MacDorman K.F. Review of constraints on vision-based gesture recognition for human-computer interaction. IET Computer Vision, 2018, vol. 12, no. 1, pp. 3–15. doi: 10.1049/iet-cvi.2017.0052
10. Dey D., Habibovic A., Pfleging B., Martens M., Terken J. Color and animation preferences for a light band eHMI in interactions between automated vehicles and pedestrians. Proc. of the 2020 CHI Conference on Human Factors in Computing Systems, 2020, pp. 1–13. doi: 10.1145/3313831.3376325
11. Biondi F., Alvarez I., Jeong K.A. Human–Vehicle cooperation in automated driving: A multidisciplinary review and appraisal. International Journal of Human-Computer Interaction, 2019, vol. 35, no. 11, pp. 932–946. doi: 10.1080/10447318.2018.1561792
12. Kennedy J., Lemaignan S., Montassier C., Lavalade P., Irfan B., Papadopoulos F., Senft E., Belpaeme T. Child speech recognition in human–robot interaction: evaluations and recommendations. Proc. 12th ACM/IEEE International Conference on Human-Robot Interaction, 2017, pp. 82–90. doi: 10.1145/2909824.3020229
13. Kipyatkova I. LSTM-based language models for very large vocabulary continuous russian speech recognition system. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2019, vol. 11658, pp. 219–226. doi: 10.1007/978-3-030-26061-3_23
14. Ryumin D., Karpov A.A. Towards automatic recognition of sign language gestures using kinect 2.0. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2017, vol. 10278, pp. 89–101. doi: 10.1007/978-3-319-58703-5_7
15. Mazhar O., Ramdani S., Navarro B., Passama R. A Framework for real-time physical Human-Robot Interaction using hand gestures. Proc. of the 2018 IEEE Workshop on Advanced Robotics and its Social Impacts (ARSO), 2018, pp. 46–47. doi: 10.1109/ARSO.2018.8625753
16. Riumin D. Detection and recognition method of 3D single-handed gestures for human-machine interaction. Proc. Conferences of Young Scientists, 2019. Available at: https://kmu.itmo.ru/digests/article/1902 (accessed: 13.05.2020). (in Russian)
17. Kagirov I., Ryumin D., Axyonov A. Method for multimodal recognition of one-handed sign language gestures through 3D convolution and LSTM neural networks. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2019, vol. 11658, pp. 191–200. doi: 10.1007/978-3-030-26061-3_20
18. Kagirov I.A., Ryumin D.A., Axyonov A.A., Karpov A.A. Multimedia database of Russian sign language items in 3D. Voprosy Jazykoznanija, 2020, no. 1, pp. 104–123. (in Russian). doi: 10.31857/S0373658X0008302-1
19. Liu W., Anguelov D., Erhan D., Szegedy C., Reed S., Fu C-Y., Berg A. SSD: single shot multibox detector. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2016, vol. 9905, pp. 21–37. doi: 10.1007/978-3-319-46448-0_2
20. He K., Zhang X., Ren S., Sun J. Deep residual learning for image recognition. Proc. 29th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778. doi: 10.1109/CVPR.2016.90
21. King D.E. Max-margin object detection. arXiv, preprint arXiv 1502.00046. 2015.
22. Parkhi O.M., Vedaldi A., Zisserman A. Deep face recognition. Proc. 26th British Machine Vision Conference (BMVC), 2015, pp. 41.1–41.12. doi: 10.5244/C.29.41
23. Krizhevsky A., Sutskever I., Hinton G.E. ImageNet classification with deep convolutional neural networks. Communications of the ACM, 2017, vol. 60, no. 6, pp. 84–90. doi: 10.1145/3065386
24. Everingham M., Van Gool L., Williams C.K., Winn J., Zisserman A. The pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 2010, vol. 88, no. 2, pp. 303–338. doi: 10.1007/s11263-009-0275-4