ViSL model: The model automatically generates sentences of Vietnamese sign language

Dang Khanh, Bessmertny Igor Alexandrovich

2024 , VOLUME 24, NUMBER 5 ( september-october )

ISSN 2226-1494 (print), ISSN 2500-0373 (online)

Publications

Editor-in-Chief

Nikiforov
Vladimir O.
D.Sc., Prof.

Partners

doi: 10.17586/2226-1494-2024-24-5-779-787

ViSL model: The model automatically generates sentences of Vietnamese sign language

K. Dang, I. A. Bessmertny

Read the full article

Article in English

For citation:

Dang Kh., Bessmertny I.A. ViSL model: The model automatically generates sentences of Vietnamese sign language. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2024, vol. 24, no. 5, pp. 779–787. doi: 10.17586/2226-1494-2024-24-5-779-787

Abstract

The main problem in building intelligent systems is the lack of data for machine learning, which is especially important for sign language recognition for the deaf and hard of hearing. One of the ways to increase the amount of data for training is synthesis. Unlike speech synthesis, it is impossible to create a sequence of gestures in Vietnamese and some other languages that exactly repeat the text. This is due to the significant limitations of the gesture dictionary and the different word order in sentences. The aim of the work is to enrich the educational corpus of video data for use in creating recognition systems for the Vietnamese Sign Language (ViSL). Since it is impossible to translate the words of the source text into gestures one to one, the problem of translating from a regular language into a sign language arises. The paper proposes to use a two-phase process for this. The first phase involves pre-processing the text with standardization of the text format, segmentation of words and sentences, and then encoding the words using the sign language dictionary. At this stage, it should be noted that there is no need to remove punctuation marks and stop words, since they are related to the accuracy of the N-gram model. Next, instead of using syntactic analysis, a statistical method for forming a sequence of gestures is used, and the Markov model on the transition graph between words is taken as a basis in which the probability of the next word depends only on the two previous words. Transition probabilities are calculated on the existing marked corpus of the ViSL. The Breadth-first Search method is used to compile a list of all sentences generated based on a given grammatical rule and a matrix of semantic interactions between words. The inverse of the logarithm of the product of the probabilities of co-occurrence of consecutive 3-word phrases in a sentence is used to estimate the frequency of occurrence of that sentence in a given data set. Based on the ViSL data of 3,234 words, we calculated probability matrices representing the relationships between words based on Vietnamese natural language data with 50 million sentences collected from Vietnamese newspapers and magazines. For different grammar rules, we compare the number of generated sentences and evaluate the accuracy of the 50 most frequent sentences. The average accuracy is 88 %. The accuracy of the generated sentences is estimated by manual statistical methods. The number of generated sentences depends on the number of word parts that are labeled according to the grammar rules. The semantic accuracy of the generated sentences will be very high if the search words are labeled with the correct part-of-speech tagging. Compared with machine learning methods, our proposed method gives very good results for languages without inflections and word order that follow certain rules, such as Vietnamese, and does not require large computational resources. The disadvantage of this method is that its accuracy largely depends on the type of word, sentence, and word segmentation. The relationship of words depends on the observed dataset. Future research direction is to generate paragraphs in sign language. The obtained data can be used in machine learning models for sign language processing tasks.

Keywords: Vietnamese sign language, sign language model, automatic sentence generation, n-gram, Markov model, breadth-first search, data enrichment, grammatical rules

References

Katti R.K., Sujatha C., Desai P., Shankar G. Character and word level gesture recognition of indian sign language. Proc. of the 2023 IEEE 8^th International Conference for Convergence in Technology (I2CT), 2023, pp. 1–6. https://doi.org/10.1109/I2CT57861.2023.10126314
Naz N., Sajid H., Ali S., Hasan O., Ehsan M.K. Signgraph: An efficient and accurate pose-based graph convolution approach toward sign language recognition. IEEE Access, 2023, vol. 11, pp. 19135–19147. https://doi.org/10.1109/ACCESS.2023.3247761
Boháček M., Hrúz M. Sign pose-based transformer for word-level sign language recognition. Proc. of the 2022 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW), 2022, pp. 182–191. https://doi.org/10.1109/WACVW54805.2022.00024
Jiang Y., Li F., Li Z., Liu Z., Wang Z. Enhancing continuous sign language recognition with Self-Attention and MediaPipe Holistic. Proc. of the 2023 8^th International Conference on Instrumentation, Control, and Automation (ICA), 2023, pp. 97–102. https://doi.org/10.1109/ICA58538.2023.10273118
Nayan N., Ghosh D., Pradhan P.M. An unsupervised learning approach to handle movement epenthesis in continuous sign language recognition. Proc. of the 2022 17^th International Conference on Control, Automation, Robotics and Vision (ICARCV), 2022, pp. 862–867. https://doi.org/10.1109/ICARCV57592.2022
Tran K.B., Nguyen U.D., Huynh Q.T. Continuous sign language recognition using MediaPipe. Proc. of the 2023 International Conference on Advanced Technologies for Communications (ATC), 2023, pp. 493–498. https://doi.org/10.1109/ATC58710.2023.10318855
Quach L.-D., Nguyen C.-N. Conversion of the Vietnammese grammar into sign language structure using the example-based machine translation algorithm. Proc. of the 2018 International Conference on Advanced Technologies for Communications (ATC), 2018, pp. 27–31. https://doi.org/10.1109/ATC.2018.8587584
Kagirov I., Ryumin D., Ivanko D., Axyonov A., Karpov A. Russian sign language: History, grammar and sociolinguistic situation in brief. Proc. of the Language Technologies for All (LT4All), 2019, pp. 71–74.
Singh C., Bansal R.K., Bansal S. Machine translation techniques using AI: A review. Proc. of the 2023 IEEE International Conference on Computer Vision and Machine Intelligence (CVMI), 2023, pp. 1–5. https://doi.org/10.1109/CVMI59935.2023.10464455
Tan M., Chen D., Li Z., Wang P. Spelling error correction with BERT based on character-phonetic. Proc. of the 2020 IEEE 6^th International Conference on Computer and Communications (ICCC), 2020, pp. 1146–1150. https://doi.org/10.1109/ICCC51575.2020.9345276
Huang C., Feng Y., Zhang Y., Zhang W. Knowledge Base System of Electrical equipment management and potential risk control based on natural language processing technology. Proc. of the 2023 Asia-Europe Conference on Electronics, Data Processing and Informatics (ACEDPI), 2023, pp. 439–445. https://doi.org/10.1109/ACEDPI58926.2023.00090
Liu S., Tang R., Chai J. A news automatic tagging method based on statistical language model. Proc. of the 2017 10^th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), 2017, pp. 1–5. https://doi.org/10.1109/CISP-BMEI.2017.8302092
Xiao J., Zhou Z. Research Progress of RNN Language Model. Proc. of the 2020 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA), 2020, pp. 1285–1288. https://doi.org/10.1109/ICAICA50127.2020.9182390
Ganai F., Khursheed F. Predicting next Word using RNN and LSTM cells: Stastical Language Modeling. Proc. of the 2019 Fifth International Conference on Image Information Processing (ICIIP), 2019, pp. 469–474. https://doi.org/10.1109/ICIIP47207.2019.8985885
Acheampong F.A., Nunoo-Mensah H., Chen W. Recognizing emotions from texts using an ensemble of transformer-based language models. Proc. of the 2021 18^th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP), 2021, pp. 161–164. https://doi.org/10.1109/ICCWAMTIP53232.2021.9674102
Lee H., Kim J.-H., Hwang E.J., Kim J., Park J.C. Leveraging large language models with vocabulary sharing for sign language translation. Proc. of the 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW), 2023, pp. 1–5. https://doi.org/10.1109/ICASSPW59220.2023.10193533
Garg H., Gupta I., Kumar K., Kaur B., Pundir D. Artificial intelligence based dynamic approach to visualize the graphs. Proc. of the 2023 International Conference on Computational Intelligence, Communication Technology and Networking (CICTN), 2023, pp. 663–667. https://doi.org/10.1109/CICTN57981.2023.10140873

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License