A method of multimodal machine sign language translation for natural human-computer interaction

Axyonov Alexandr A. , Ildar A. Kagirov, Ryumin Dmitry A.

2022 , VOLUME 22, NUMBER 3 ( March-April )

ISSN 2226-1494 (print), ISSN 2500-0373 (online)

Publications

Editor-in-Chief

Nikiforov
Vladimir O.
D.Sc., Prof.

Partners

doi: 10.17586/2226-1494-2022-22-3-585-593

A method of multimodal machine sign language translation for natural human-computer interaction

A. A. Axyonov, I. A. Kagirov, D. A. Ryumin

Read the full article

Article in Russian

For citation:

Axyonov A.A., Kagirov I.A., Ryumin D.A. A method of multimodal machine sign language translation for natural human-computer interaction. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2022, vol. 22, no. 3, pp. 585–593 (in Russian). doi: 10.17586/2226-1494-2022-22-3-585-593

Abstract

This paper aims to investigate the possibility of robustness enhancement as applied to an automatic system for isolated signs and sign languages recognition, through the use of the most informative spatiotemporal visual features. The authors present a method for the automatic recognition of gestural information, based on an integrated neural network model, which analyses spatiotemporal visual features: 2D and 3D distances between the palm and the face; the area of the hand and the face intersection; hand configuration; the gender and the age of signers. A 3DResNet-18-based neural network model for hand configuration data extraction was elaborated. Deepface software platform neural network models were embedded in the method in order to extract gender and age-related data. The proposed method was tested on the data from the multimodal corpus of sign language elements TheRuSLan, with the accuracy of 91.14 %. The results of this investigation not only improve the accuracy and robustness of machine sign language translation, but also enhance the naturalness of human-machine interaction in general. Besides that, the results have application in various fields of social services, medicine, education and robotics, as well as different public service centers.

Keywords: body language, gesticulation, machine sign language translation, naturalness of a communication medium body language, gesticulation, machine sign language translation, naturalness of a communication medium

Acknowledgements. This research is financially supported by the Russian Science Foundation (No. 21-71-00141, https://rscf.ru/en/project/21-71-00141/)

References

1. Ryumin D., Kagirov I., Ivanko D., Axyonov A., Karpov A. Automatic detection and recognition of 3D manual gestures for human-machine interaction. International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 2019, vol. 42, no. 2/W12, pp. 179–183. https://doi.org/10.5194/isprs-archives-XLII-2-W12-179-2019

2. Karpov A.A., Yusupov R.M. Multimodal interfaces of human–computer interaction. Herald of the Russian Academy of Sciences, 2018, vol. 88, no. 1, pp. 67–74. https://doi.org/10.1134/S1019331618010094

3. Ryumin D., Karpov A.A. Towards automatic recognition of sign language gestures using kinect 2.0. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2017, vol. 10278, pp. 89–101. https://doi.org/10.1007/978-3-319-58703-5_7

4. Ryumin D. Automated hand detection method for tasks of gesture recognition in human-machine interfaces. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2020, vol. 20, no. 4, pp. 525–531. (in Russian). https://doi.org/10.17586/2226-1494-2020-20-4-525-531

5. Tomskaia M.V., Maslova L.N. Gender research in national linguistics. Russian language in the modern society: functional and status characteristics. Moscow, 2005, pp. 102–130. (in Russian)

6. Carli L., LaFleur S., Loeber C. Nonverbal behavior, gender, and influence. Journal of Personality and Social Psychology, 1995, vol. 68, no. 6, pp. 1030–1041. https://doi.org/10.1037/0022-3514.68.6.1030

7. Iriskhanova O., Cienki A. The semiotics of gestures in cognitive linguistics: Contribution and challenges. Voprosy Kognitivnoy Lingvistiki, 2018, vol. 4, pp. 25–36. https://doi.org/10.20916/1812-3228-2018-4-25-36

8. Masson-Carro I., Goudbeek M., Krahmer E. Coming of age in gesture: A comparative study of gesturing and pantomiming in older children and adults. Proc. of the 4th Gesture and Speech in Interaction Conference (GESPIN), 2015, pp. 1–7.

9. Reviewed Work: Sign language structure: An outline of the visual communication systems of the American deaf by William C. Stokoe, Jr. Language, 1961, vol. 37, no. 2, pp. 269–271. https://doi.org/10.2307/410856

10. Dimskis L.S. Learning Sign Language. Moscow, Akademija Publ., 2002, 128 p. (in Russian)

11. Sonkusare J., Chopade N., Sor R., Tade S. A review on hand gesture recognition system. Proc. of the 1st International Conference on Computing, Communication, Control and Automation, 2015, pp. 790–794. https://doi.org/10.1109/ICCUBEA.2015.158

12. De Smedt Q., Wannous H., Vandeborre J. Heterogeneous hand gesture recognition using 3D dynamic skeletal data. Computer Vision and Image Understanding, 2019, vol. 181, pp. 60–72. https://doi.org/10.1016/j.cviu.2019.01.008

13. Grif M., Prikhodko A., Bakaev M. Recognition of signs and movement epentheses in Russian Sign Language. Communications in Computer and Information Science, 2022, vol. 1503, pp. 67–82. https://doi.org/10.1007/978-3-030-93715-7_5

14. Grishina E.A. Ring and grappolo: Fingertip connections in Russian gesticulation and their meanings. Komp'juternaja Lingvistika i Intellektual'nye Tehnologii, 2014, no. 13, pp. 182–202. (in Russian)

15. Zhang C., Yang X., Tian Y. Histogram of 3D Facets: A characteristic descriptor for hand gesture recognition. Proc. of the 10th International Conference Automatic Face and Gesture Recognition (FG), 2013, pp. 6553754. https://doi.org/10.1109/FG.2013.6553754

16. Ryumin D.A., Kagirov I.A. Approaches to automatic gesture recognition: hardware and methods overview. Manned Spaceflight, 2021, no. 3(40), pp. 82–99. (in Russian). https://doi.org/10.34131/MSF.21.3.82-99

17. Camgoz C.N., Hadfield S., Koller O., Bowden R. SubUNets: End-to-end hand shape and continuous sign language recognition. Proc. of the 16th International Conference on Computer Vision (ICCV), 2017, pp. 3075–3084. https://doi.org/10.1109/ICCV.2017.332

18. Grif M.G., Korolkova O.O., Prikhodko A.L. Sign speech recognition taking into account combinatorial changes of gestures. Problems, Methods, and Technologies in the Computer Science. Proceedings of the XXI International Scientific and Technical Conference, 2021, pp. 1387–1393. (in Russian)

19. Ryumin D., Kagirov I., Axyonov A., Pavlyuk N., Saveliev A., Kipyatkova I., Zelezny M., Mporas I., Karpov A. A multimodal user interface for an assistive robotic shopping cart. Electronics, 2020, vol. 9, no. 12, pp. 1–25. https://doi.org/10.3390/electronics9122093

20. Axyonov А., Ryumin D., Kagirov I. Method of multi-modal video analysis of hand movements for automatic recognition of isolated signs of Russian sign language. International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 2021, vol. 44, no. 2/W1, pp. 7–13. https://doi.org/10.5194/isprs-archives-XLIV-2-W1-2021-7-2021

21. Wu J., Zhang Y., Zhao X. A prototype-based generalized zero-shot learning framework for hand gesture recognition. Proc. of the 25th International Conference on Pattern Recognition (ICPR), 2021, pp. 3435–3442. https://doi.org/10.1109/ICPR48806.2021.9412548

22. Voskou A., Panousis K.P., Kosmopoulos D., Metaxas D.N., Chatzis S. Stochastic transformer networks with linear competing units: Application to end-to-end SL translation. Proc. of the 18th International Conference on Computer Vision (ICPR), 2021, pp. 11926–11935. https://doi.org/10.1109/ICCV48922.2021.01173

23. Jiang S., Sun B., Wang L., Bai Y., Li K., Fu Y. Skeleton aware multi-modal sign language recognition. Proc. of the Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 3408–3418. https://doi.org/10.1109/CVPRW53098.2021.00380

24. Ryumin D. Models and Methods for Automatic Recognition of Russian Sign Language Elements for Human-Machine Interaction. Academic dissertation сandidate of engineering. ITMO University, 2020, 352 p. Available at: http://fppo.ifmo.ru/dissertation/?number=246869(accessed: 26.03.2022). (in Russian)

25. Winata G.I., Kampman O.P., Fung P. Attention-based LSTM for psychological stress detection from spoken language using distant supervision. Proc. of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 6204–6208. https://doi.org/10.1109/ICASSP.2018.8461990

26. Serengil S.I., Ozpinar A. LightFace: A Hybrid deep face recognition framework. Proc. of the Innovations in Intelligent Systems and Applications Conference (ASYU), 2020, pp. 9259802. https://doi.org/10.1109/ASYU50717.2020.9259802

27. Zhang H., Cisse M., Dauphin Y.N., Lopez-Paz D. MixUp: Beyond empirical risk minimization. Proc. of the 6th International Conference on Learning Representations (ICLR), 2018.

28. Dresvyanskiy D., Ryumina E., Kaya H., Markitantov M., Karpov A., Minker W. End-to-end modeling and transfer learning for audiovisual emotion recognition in-the-wild. Multimodal Technologies and Interaction, 2022, vol. 6, no. 2, pp. 11. https://doi.org/10.3390/mti6020011

29. Zhong Z., Lin Z.Q., Bidart R., Hu X., Daya I.B., Li Z., Zheng W., Li J., Wong A. Squeeze-and-attention networks for semantic segmentation. Proc. of the Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 13062–13071. https://doi.org/10.1109/cvpr42600.2020.01308

30. Kagirov I., Ivanko D., Ryumin D., Axyonov A., Karpov A. TheRuSLan: Database of Russian Sign Language. Proc. of the 12th Conference on Language Resources and Evaluation (LREC), 2020, pp. 6079–6085.

31. Kagirov I.A., Ryumin D.A., Axyonov A.A., Karpov A.A. Multimedia database of russian sign language items in 3D. Voprosy Jazykoznanija, 2020, no. 1, pp. 104–123. (in Russian). https://doi.org/10.31857/S0373658X0008302-1

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License