<div>
	AUTOMATED HANDDETECTION METHOD FOR TASKS OF GESTURE RECOGNITION IN HUMAN-MACHINE INTERFACES</div>

Ryumin Dmitry A.

2020 , VOLUME 20, NUMBER 4 ( july-august )

ISSN 2226-1494 (print), ISSN 2500-0373 (online)

Publications

Editor-in-Chief

Nikiforov
Vladimir O.
D.Sc., Prof.

Partners

doi: 10.17586/2226-1494-2020-20-4-525-531

AUTOMATED HANDDETECTION METHOD FOR TASKS OF GESTURE RECOGNITION IN HUMAN-MACHINE INTERFACES

D. A. Ryumin

Read the full article

Article in Russian

For citation:

Ryumin D. Automated hand detection method for tasks of gesture recognition in human-machine interfaces. Scientiﬁc and Technical Journal of Information Technologies, Mechanics and Optics, 2020, vol. 20, no. 4, pp. 525–531 (in Russian).

doi: 10.17586/2226-1494-2020-20-4-525-531

Abstract

Subject of Research. The paper presents a solution for automatic analysis and recognition of human hand gestures. Recognition of the elements of sign languages is a topical task in the modern information world. The problem of gesture recognition efﬁciency has not been resolved due to the presence of cultural diversities in the world sign languages, the differences in the conditions for showing gestures. The problem becomes more complicated by the small size of ﬁngers. Method. The presented method is based on the analysis of frame sequences of a video stream obtained using an optical camera. For processing of the obtained video sequences, it is proposed to use a depth map and a combination of modern classiﬁers based on Single Shot MultiBox Detector deep neural network architectures with a reduced network model of ResNet-10, NASNetMobile and LSTM type. Main Results. Experiments on automatic video analysis of hand movements and gesture recognition in real time show great potential of the proposed method for human-machine interaction tasks. The recognition accuracy of 48 one-handed gestures based on TheRuSLan database is 79 %. This result is better as compared to the other approaches to solving this problem. Practical Relevance. The results can be used in automatic systems for recognition of sign languages, as well as in the situations where contactless interaction of various user groups is necessary, for example, people with hearing and vision impairments, mobile information robots through automatic recognition of sign information.

Keywords: hand movement video analysis, depth map, gesture recognition, face detection, deep neural networks

References

1. Karpov A.A., Yusupov R.M. Multimodal interfaces of human–computer interaction. Herald of the Russian Academy of Sciences, 2018, vol. 88, no. 1, pp. 67–74. doi: 10.1134/S1019331618010094

2. Kagirov I., Karpov A., Kipyatkova I., Klyuzhev K., Kudryavcev A., Kudryavcev I., Ryumin D. Lower limbs exoskeleton control system based on intelligent human-machine interface. Studies in Computational Intelligence, 2020, vol. 868, pp. 457–466. doi: 10.1007/978-3-030-32258-8_54

3. Parker L.E., Rus D., Sukhatme G.S. Multiple mobile robot systems. Springer Handbook of Robotics. Springer, Cham, 2016, pp. 1335–1384. doi: 10.1007/978-3-319-32552-1_53

4. Ryumin D., Kagirov I., Ivanko D., Axyonov A., Karpov A.A. Automatic detection and recognition of 3D manual gestures for human-machine interaction. International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences - ISPRS Archives, 2019, vol. 42, no. 2/W12, pp. 179–183. doi: 10.5194/isprs-archives-XLII-2-W12-179-2019

5. Mahmud S., Lin X., Kim J.H. Interface for Human Machine Interaction for assistant devices: A Review. Proc. 10th Annual Computing and Communication Workshop and Conference (CCWC), 2020, pp. 768–773. doi: 10.1109/CCWC47524.2020.9031244

6. Ivanko D., Ryumin D., Kipyatkova I., Axyonov A., Karpov A. Lip-reading using pixel-based and geometry-based features for multimodal human-robot interfaces. Smart Innovation, Systems and Technologies, 2020, vol. 154, pp. 477–486. doi: 10.1007/978-981-13-9267-2_39

7. Janssen C.P., Donker S.F., Brumby D.P., Kun A.L. History and future of human-automation interaction. International Journal of Human Computer Studies, 2019, vol. 131, pp. 99–107. doi: 10.1016/j.ijhcs.2019.05.006

8. Prostejovsky A.M., Brosinsky C., Heussen K., Westermann D., Kreusel J., Marinelli M. The future role of human operators in highly automated electric power systems. Electric Power Systems Research, 2019, vol. 175, pp. 105883. doi: 10.1016/j.epsr.2019.105883

9. Chakraborty B.K., Sarma D., Bhuyan M.K., MacDorman K.F. Review of constraints on vision-based gesture recognition for human-computer interaction. IET Computer Vision, 2018, vol. 12, no. 1, pp. 3–15. doi: 10.1049/iet-cvi.2017.0052

10. Dey D., Habibovic A., Pfleging B., Martens M., Terken J. Color and animation preferences for a light band eHMI in interactions between automated vehicles and pedestrians. Proc. of the 2020 CHI Conference on Human Factors in Computing Systems, 2020, pp. 1–13. doi: 10.1145/3313831.3376325

11. Biondi F., Alvarez I., Jeong K.A. Human–Vehicle cooperation in automated driving: A multidisciplinary review and appraisal. International Journal of Human-Computer Interaction, 2019, vol. 35, no. 11, pp. 932–946. doi: 10.1080/10447318.2018.1561792

12. Kennedy J., Lemaignan S., Montassier C., Lavalade P., Irfan B., Papadopoulos F., Senft E., Belpaeme T. Child speech recognition in human–robot interaction: evaluations and recommendations. Proc. 12th ACM/IEEE International Conference on Human-Robot Interaction, 2017, pp. 82–90. doi: 10.1145/2909824.3020229

13. Kipyatkova I. LSTM-based language models for very large vocabulary continuous russian speech recognition system. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2019, vol. 11658, pp. 219–226. doi: 10.1007/978-3-030-26061-3_23

14. Ryumin D., Karpov A.A. Towards automatic recognition of sign language gestures using kinect 2.0. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2017, vol. 10278, pp. 89–101. doi: 10.1007/978-3-319-58703-5_7

15. Mazhar O., Ramdani S., Navarro B., Passama R. A Framework for real-time physical Human-Robot Interaction using hand gestures. Proc. of the 2018 IEEE Workshop on Advanced Robotics and its Social Impacts (ARSO), 2018, pp. 46–47. doi: 10.1109/ARSO.2018.8625753

16. Riumin D. Detection and recognition method of 3D single-handed gestures for human-machine interaction. Proc. Conferences of Young Scientists, 2019. Available at: https://kmu.itmo.ru/digests/article/1902 (accessed: 13.05.2020). (in Russian)

17. Kagirov I., Ryumin D., Axyonov A. Method for multimodal recognition of one-handed sign language gestures through 3D convolution and LSTM neural networks. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2019, vol. 11658, pp. 191–200. doi: 10.1007/978-3-030-26061-3_20

18. Kagirov I.A., Ryumin D.A., Axyonov A.A., Karpov A.A. Multimedia database of Russian sign language items in 3D. Voprosy Jazykoznanija, 2020, no. 1, pp. 104–123. (in Russian). doi: 10.31857/S0373658X0008302-1

19. Liu W., Anguelov D., Erhan D., Szegedy C., Reed S., Fu C-Y., Berg A. SSD: single shot multibox detector. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2016, vol. 9905, pp. 21–37. doi: 10.1007/978-3-319-46448-0_2

20. He K., Zhang X., Ren S., Sun J. Deep residual learning for image recognition. Proc. 29th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778. doi: 10.1109/CVPR.2016.90

21. King D.E. Max-margin object detection. arXiv, preprint arXiv 1502.00046. 2015.

22. Parkhi O.M., Vedaldi A., Zisserman A. Deep face recognition. Proc. 26th British Machine Vision Conference (BMVC), 2015, pp. 41.1–41.12. doi: 10.5244/C.29.41

23. Krizhevsky A., Sutskever I., Hinton G.E. ImageNet classification with deep convolutional neural networks. Communications of the ACM, 2017, vol. 60, no. 6, pp. 84–90. doi: 10.1145/3065386

24. Everingham M., Van Gool L., Williams C.K., Winn J., Zisserman A. The pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 2010, vol. 88, no. 2, pp. 303–338. doi: 10.1007/s11263-009-0275-4

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License