doi: 10.17586/2226-1494-2022-22-3-585-593

A method of multimodal machine sign language translation for natural human-computer interaction

A. A. Axyonov, I. A. Kagirov, D. A. Ryumin

Read the full article  ';
Article in Russian

For citation:
Axyonov A.A., Kagirov I.A., Ryumin D.A. A method of multimodal machine sign language translation for natural human-computer interaction. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2022, vol. 22, no. 3, pp. 585–593 (in Russian). doi: 10.17586/2226-1494-2022-22-3-585-593

This paper aims to investigate the possibility of robustness enhancement as applied to an automatic system for isolated signs and sign languages recognition, through the use of the most informative spatiotemporal visual features. The authors present a method for the automatic recognition of gestural information, based on an integrated neural network model, which analyses spatiotemporal visual features: 2D and 3D distances between the palm and the face; the area of the hand and the face intersection; hand configuration; the gender and the age of signers. A 3DResNet-18-based neural network model for hand configuration data extraction was elaborated. Deepface software platform neural network models were embedded in the method in order to extract gender and age-related data. The proposed method was tested on the data from the multimodal corpus of sign language elements TheRuSLan, with the accuracy of 91.14 %. The results of this investigation not only improve the accuracy and robustness of machine sign language translation, but also enhance the naturalness of human-machine interaction in general. Besides that, the results have application in various fields of social services, medicine, education and robotics, as well as different public service centers.

Keywords: body language, gesticulation, machine sign language translation, naturalness of a communication medium body language, gesticulation, machine sign language translation, naturalness of a communication medium

Acknowledgements. This research is financially supported by the Russian Science Foundation (No. 21-71-00141,

1. Ryumin D., Kagirov I., Ivanko D., Axyonov A., Karpov A. Automatic detection and recognition of 3D manual gestures for human-machine interaction. International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 2019, vol. 42, no. 2/W12, pp. 179–183.
2. Karpov A.A., Yusupov R.M. Multimodal interfaces of human–computer interaction. Herald of the Russian Academy of Sciences, 2018, vol. 88, no. 1, pp. 67–74.
3. Ryumin D., Karpov A.A. Towards automatic recognition of sign language gestures using kinect 2.0. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2017, vol. 10278, pp. 89–101.
4. Ryumin D. Automated hand detection method for tasks of gesture recognition in human-machine interfaces. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2020, vol. 20, no. 4, pp. 525–531. (in Russian).
5. Tomskaia M.V., Maslova L.N. Gender research in national linguistics. Russian language in the modern society: functional and status characteristics. Moscow, 2005, pp. 102–130. (in Russian)
6. Carli L., LaFleur S., Loeber C. Nonverbal behavior, gender, and influence. Journal of Personality and Social Psychology, 1995, vol. 68, no. 6, pp. 1030–1041.
7. Iriskhanova O., Cienki A. The semiotics of gestures in cognitive linguistics: Contribution and challenges. Voprosy Kognitivnoy Lingvistiki, 2018, vol. 4, pp. 25–36.
8. Masson-Carro I., Goudbeek M., Krahmer E. Coming of age in gesture: A comparative study of gesturing and pantomiming in older children and adults. Proc. of the 4th Gesture and Speech in Interaction Conference (GESPIN), 2015, pp. 1–7.
9. Reviewed Work: Sign language structure: An outline of the visual communication systems of the American deaf by William C. Stokoe, Jr. Language, 1961, vol. 37, no. 2, pp. 269–271.
10. Dimskis L.S. Learning Sign Language. Moscow, Akademija Publ., 2002, 128 p. (in Russian)
11. Sonkusare J., Chopade N., Sor R., Tade S. A review on hand gesture recognition system. Proc. of the 1st International Conference on Computing, Communication, Control and Automation, 2015, pp. 790–794.
12. De Smedt Q., Wannous H., Vandeborre J. Heterogeneous hand gesture recognition using 3D dynamic skeletal data. Computer Vision and Image Understanding, 2019, vol. 181, pp. 60–72.
13. Grif M., Prikhodko A., Bakaev M. Recognition of signs and movement epentheses in Russian Sign Language. Communications in Computer and Information Science, 2022, vol. 1503, pp. 67–82.
14. Grishina E.A. Ring and grappolo: Fingertip connections in Russian gesticulation and their meanings. Komp'juternaja Lingvistika i Intellektual'nye Tehnologii, 2014, no. 13, pp. 182–202. (in Russian)
15. Zhang C., Yang X., Tian Y. Histogram of 3D Facets: A characteristic descriptor for hand gesture recognition. Proc. of the 10th International Conference Automatic Face and Gesture Recognition (FG), 2013, pp. 6553754.
16. Ryumin D.A., Kagirov I.A. Approaches to automatic gesture recognition: hardware and methods overview. Manned Spaceflight, 2021, no. 3(40), pp. 82–99. (in Russian).
17. Camgoz C.N., Hadfield S., Koller O., Bowden R. SubUNets: End-to-end hand shape and continuous sign language recognition. Proc. of the 16th International Conference on Computer Vision (ICCV), 2017, pp. 3075–3084.
18. Grif M.G., Korolkova O.O., Prikhodko A.L. Sign speech recognition taking into account combinatorial changes of gestures. Problems, Methods, and Technologies in the Computer Science. Proceedings of the XXI International Scientific and Technical Conference, 2021, pp. 1387–1393. (in Russian)
19. Ryumin D., Kagirov I., Axyonov A., Pavlyuk N., Saveliev A., Kipyatkova I., Zelezny M., Mporas I., Karpov A. A multimodal user interface for an assistive robotic shopping cart. Electronics, 2020, vol. 9, no. 12, pp. 1–25.
20. Axyonov А., Ryumin D., Kagirov I. Method of multi-modal video analysis of hand movements for automatic recognition of isolated signs of Russian sign language. International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 2021, vol. 44, no. 2/W1, pp. 7–13.
21. Wu J., Zhang Y., Zhao X. A prototype-based generalized zero-shot learning framework for hand gesture recognition. Proc. of the 25th International Conference on Pattern Recognition (ICPR), 2021, pp. 3435–3442.
22. Voskou A., Panousis K.P., Kosmopoulos D., Metaxas D.N., Chatzis S. Stochastic transformer networks with linear competing units: Application to end-to-end SL translation. Proc. of the 18th International Conference on Computer Vision (ICPR), 2021, pp. 11926–11935.
23. Jiang S., Sun B., Wang L., Bai Y., Li K., Fu Y. Skeleton aware multi-modal sign language recognition. Proc. of the Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 3408–3418.
24. Ryumin D. Models and Methods for Automatic Recognition of Russian Sign Language Elements for Human-Machine Interaction. Academic dissertation сandidate of engineering. ITMO University, 2020, 352 p. Available at: 26.03.2022). (in Russian)
25. Winata G.I., Kampman O.P., Fung P. Attention-based LSTM for psychological stress detection from spoken language using distant supervision. Proc. of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 6204–6208.
26. Serengil S.I., Ozpinar A. LightFace: A Hybrid deep face recognition framework. Proc. of the Innovations in Intelligent Systems and Applications Conference (ASYU), 2020, pp. 9259802.
27. Zhang H., Cisse M., Dauphin Y.N., Lopez-Paz D. MixUp: Beyond empirical risk minimization. Proc. of the 6th International Conference on Learning Representations (ICLR), 2018.
28. Dresvyanskiy D., Ryumina E., Kaya H., Markitantov M., Karpov A., Minker W. End-to-end modeling and transfer learning for audiovisual emotion recognition in-the-wild. Multimodal Technologies and Interaction, 2022, vol. 6, no. 2, pp. 11.
29. Zhong Z., Lin Z.Q., Bidart R., Hu X., Daya I.B., Li Z., Zheng W., Li J., Wong A. Squeeze-and-attention networks for semantic segmentation. Proc. of the Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 13062–13071.
30. Kagirov I., Ivanko D., Ryumin D., Axyonov A., Karpov A. TheRuSLan: Database of Russian Sign Language. Proc. of the 12th Conference on Language Resources and Evaluation (LREC), 2020, pp. 6079–6085.
31. Kagirov I.A., Ryumin D.A., Axyonov A.A., Karpov A.A. Multimedia database of russian sign language items in 3D. Voprosy Jazykoznanija, 2020, no. 1, pp. 104–123. (in Russian).

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License
Copyright 2001-2022 ©
Scientific and Technical Journal
of Information Technologies, Mechanics and Optics.
All rights reserved.