Segmentation of word gestures in sign language video

Dang Khanh, Bessmertny Igor Alexandrovich

2023 , VOLUME 23, NUMBER 5 ( september-october )

ISSN 2226-1494 (print), ISSN 2500-0373 (online)

Publications

Editor-in-Chief

Nikiforov
Vladimir O.
D.Sc., Prof.

Partners

doi: 10.17586/2226-1494-2023-23-5-980-988

Segmentation of word gestures in sign language video

K. Dang, I. A. Bessmertny

Read the full article

Article in Russian

For citation:

Dang Khanh, Bessmertny I.A. Segmentation of word gestures in sign language video. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2023, vol. 23, no. 5, pp. 980–988 (in Russian). doi: 10.17586/2226-1494-2023-23-5-980-988

Abstract

Despite the widespread use of automatic speech recognition and video subtitles, sign language is still a significant communication channel for people with hearing impairments. An important task in the process of automatic recognition of sign language is the segmentation of video into fragments corresponding to individual words. In contrast to the known methods of segmentation of sign language words, the paper proposes an approach that does not require the use of sensors (accelerometers). To segment the video into words in this study, an assessment of the dynamics of the image is used, and the boundary between words is determined using a threshold value. Since in addition to the speaker, there may be other moving objects in the frame that create noise, the dynamics in the work is estimated by the average change from frame to frame of the Euclidean distance between the coordinate characteristics of the hand, forearm, eyes and mouth. The calculation of the coordinate characteristics of the hands and head is carried out using the MediaPipe library. The developed algorithm was tested for the Vietnamese sign language on an open set of 4364 videos collected at the Vietnamese Sign Language Training Center, and demonstrated accuracy comparable to manual segmentation of video by an operator and low resource consumption, which will allow using the algorithm for automatic gesture recognition in real time. The experiments have shown that the task of segmentation of sign language, unlike the known methods, can be effectively solved without the use of sensors. Like other methods of gesture segmentation, the proposed algorithm does not work satisfactorily at a high speed of sign language when words overlap each other. This problem is the subject of further research.

Keywords: sign language, word gesture segmentation, MediaPipe, LSTM, thresholding method, sign language recognition

References

Thoa N.T.K. Vietnamese Sign language - unresolved issues. Proc. of the 4^th Conference on Language Teaching and Learning" (LTAL), 2022. https://doi.org/10.21467/proceedings.132.23
Li D., Rodriguez C., Yu X., Li H. Word-level deep sign language recognition from video: A new large-scale dataset and methods comparison. Proc. of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2020, pp. 1459–1469. https://doi.org/10.1109/wacv45572.2020.9093512
Min Y., Hao A., Chai X., Chen X. Visual alignment constraint for continuous sign language recognition. Proc. of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 11522–11531. https://doi.org/10.1109/iccv48922.2021.01134
Camgoz N.C., Hadfield S., Koller O., Bowden R. SubUNets: End-to-end hand shape and continuous sign language recognition. Proc. of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 3075–3084. https://doi.org/10.1109/iccv.2017.332
Camgoz N.C., Kindiroglu A., Karabuklu S., Kelepir M., Ozsoy A.S., Akarun L. BosphorusSign: A Turkish sign language recognition corpus in health and finance domains. Proc. of the International Conference on Language Resources and Evaluation (LREC), 2016.
Ko S.-K., Kim C.J., Jung H., Cho C. Neural sign language translation based on human keypoint estimation. Applied Sciences, 2019, vol. 9, no. 13, pp. 2683. https://doi.org/10.3390/app9132683
Lea C., Vidal R., Reiter A., Hager G.D. Temporal convolutional networks: A unified approach to action segmentation. Lecture Notes in Computer Science, 2016, vol. 9915, pp. 47–54. https://doi.org/10.1007/978-3-319-49409-8_7
Kulkarni K., Evangelidis G., Cech J., Horaud R. Continuous action recognition based on sequence alignment. International Journal of Computer Vision, 2015, vol. 112, no. 1, pp. 90–114. https://doi.org/10.1007/s11263-014-0758-9
Luc P., Neverova N., Couprie C., Verbeek J., LeCun Y. Predicting deeper into the future of semantic segmentation. Proc. of the 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 648–657. https://doi.org/10.1109/ICCV.2017.77
Yi F., Wen H., Jiang T. ASFormer: Transformer for action segmentation. arXiv, 2021, arXiv:2110.08568. https://doi.org/10.48550/arXiv.2110.08568
Brognaux S., Drugman T. HMM-based speech segmentation: improvements of fully automatic approaches. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2016, vol. 24, no. 1, pp. 5–15. https://doi.org/10.1109/TASLP.2015.2456421
Atmaja B.T., Akagi M. Speech emotion recognition based on speech segment using LSTM with attention model. IEEE International Conference on Signals and Systems (ICSigSys), 2019, pp. 40–44. https://doi.org/10.1109/ICSIGSYS.2019.8811080
Gujarathi P.V., Patil S.R. Gaussian filter-based speech segmentation algorithm for Gujarati language. Smart Innovation, Systems and Technologies, 2021, vol. 224, pp. 747–756. https://doi.org/10.1007/978-981-16-1502-3_74
Chen M.-H., Li B., Bao Y., AlRegib G., Kira Z. Action segmentation with joint self-supervised temporal domain adaptation. Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 9454–9463. https://doi.org/10.1109/cvpr42600.2020.00947
Madrid G.K.R., Villanueva R.G.R., Caya M.V.C. Recognition of dynamic Filipino Sign language using MediaPipe and long short-term memory. Proc. of the 2022 13^th International Conference on Computing Communication and Networking Technologies (ICCCNT), 2022. https://doi.org/10.1109/ICCCNT54827.2022.9984599
Adhikary S., Talukdar A.K., Sarma K.K. A vision-based system for recognition of words used in Indian Sign Language using MediaPipe. Proc. of the 2021 Sixth International Conference on Image Information Processing (ICIIP), 2021, pp. 390–394. https://doi.org/10.1109/ICIIP53038.2021.9702551
Zhang S., Chen W., Chen C., Liu Y. Human deep squat detection method based on MediaPipe combined with Yolov5 network. Proc. of the 2022 41^st Chinese Control Conference (CCC), 2022, pp. 6404–6409. https://doi.org/10.23919/CCC55666.2022.9902631
Quiñonez Y., Lizarraga C., Aguayo R. Machine learning solutions with MediaPipe. Proc. of the 2022 11^th International Conference On Software Process Improvement (CIMPS), 2022, pp. 212–215. https://doi.org/10.1109/CIMPS57786.2022.10035706
Ma J., Ma L., Ruan W., Chen H., Feng J. A Wushu posture recognition system based on MediaPipe. Proc. of the 2022 2^nd International Conference on Information Technology and Contemporary Sports (TCS), 2022, pp. 10–13. https://doi.org/10.1109/TCS56119.2022.9918744
Nguyen D.Q., Vu T., Nguyen D.Q., Dras M., Johnson M. 2017. From word segmentation to POS tagging for Vietnamese. Proc. of the 15^th Australasian Language Technology Association Workshop, 2012, pp. 108–113.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License