doi: 10.17586/2226-1494-2023-23-5-980-988


Segmentation of word gestures in sign language video

K. Dang, I. A. Bessmertny


Read the full article  ';
Article in Russian

For citation:
Dang Khanh, Bessmertny I.A. Segmentation of word gestures in sign language video. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2023, vol. 23, no. 5, pp. 980–988 (in Russian). doi: 10.17586/2226-1494-2023-23-5-980-988


Abstract
Despite the widespread use of automatic speech recognition and video subtitles, sign language is still a significant communication channel for people with hearing impairments. An important task in the process of automatic recognition of sign language is the segmentation of video into fragments corresponding to individual words. In contrast to the known methods of segmentation of sign language words, the paper proposes an approach that does not require the use of sensors (accelerometers). To segment the video into words in this study, an assessment of the dynamics of the image is used, and the boundary between words is determined using a threshold value. Since in addition to the speaker, there may be other moving objects in the frame that create noise, the dynamics in the work is estimated by the average change from frame to frame of the Euclidean distance between the coordinate characteristics of the hand, forearm, eyes and mouth. The calculation of the coordinate characteristics of the hands and head is carried out using the MediaPipe library. The developed algorithm was tested for the Vietnamese sign language on an open set of 4364 videos collected at the Vietnamese Sign Language Training Center, and demonstrated accuracy comparable to manual segmentation of video by an operator and low resource consumption, which will allow using the algorithm for automatic gesture recognition in real time. The experiments have shown that the task of segmentation of sign language, unlike the known methods, can be effectively solved without the use of sensors. Like other methods of gesture segmentation, the proposed algorithm does not work satisfactorily at a high speed of sign language when words overlap each other. This problem is the subject of further research. 

Keywords: sign language, word gesture segmentation, MediaPipe, LSTM, thresholding method, sign language recognition

References
  1. Thoa N.T.K. Vietnamese Sign language - unresolved issues. Proc. of the 4th Conference on Language Teaching and Learning" (LTAL), 2022. https://doi.org/10.21467/proceedings.132.23
  2. Li D., Rodriguez C., Yu X., Li H. Word-level deep sign language recognition from video: A new large-scale dataset and methods comparison. Proc. of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2020, pp. 1459–1469. https://doi.org/10.1109/wacv45572.2020.9093512
  3. Min Y., Hao A., Chai X., Chen X. Visual alignment constraint for continuous sign language recognition. Proc. of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 11522–11531. https://doi.org/10.1109/iccv48922.2021.01134
  4. Camgoz N.C., Hadfield S., Koller O., Bowden R. SubUNets: End-to-end hand shape and continuous sign language recognition. Proc. of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 3075–3084. https://doi.org/10.1109/iccv.2017.332
  5. Camgoz N.C., Kindiroglu A., Karabuklu S., Kelepir M., Ozsoy A.S., Akarun L. BosphorusSign: A Turkish sign language recognition corpus in health and finance domains. Proc. of the International Conference on Language Resources and Evaluation (LREC), 2016.
  6. Ko S.-K., Kim C.J., Jung H., Cho C. Neural sign language translation based on human keypoint estimation. Applied Sciences, 2019, vol. 9, no. 13, pp. 2683. https://doi.org/10.3390/app9132683
  7. Lea C., Vidal R., Reiter A., Hager G.D. Temporal convolutional networks: A unified approach to action segmentation. Lecture Notes in Computer Science, 2016, vol. 9915, pp. 47–54. https://doi.org/10.1007/978-3-319-49409-8_7
  8. Kulkarni K., Evangelidis G., Cech J., Horaud R. Continuous action recognition based on sequence alignment. International Journal of Computer Vision, 2015, vol. 112, no. 1, pp. 90–114. https://doi.org/10.1007/s11263-014-0758-9
  9. Luc P., Neverova N., Couprie C., Verbeek J., LeCun Y. Predicting deeper into the future of semantic segmentation. Proc. of the 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 648–657. https://doi.org/10.1109/ICCV.2017.77
  10. Yi F., Wen H., Jiang T. ASFormer: Transformer for action segmentation. arXiv, 2021, arXiv:2110.08568. https://doi.org/10.48550/arXiv.2110.08568
  11. Brognaux S., Drugman T. HMM-based speech segmentation: improvements of fully automatic approaches. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2016, vol. 24, no. 1, pp. 5–15. https://doi.org/10.1109/TASLP.2015.2456421
  12. Atmaja B.T., Akagi M. Speech emotion recognition based on speech segment using LSTM with attention model. IEEE International Conference on Signals and Systems (ICSigSys), 2019, pp. 40–44. https://doi.org/10.1109/ICSIGSYS.2019.8811080
  13. Gujarathi P.V., Patil S.R. Gaussian filter-based speech segmentation algorithm for Gujarati language. Smart Innovation, Systems and Technologies, 2021, vol. 224, pp. 747–756. https://doi.org/10.1007/978-981-16-1502-3_74
  14. Chen M.-H., Li B., Bao Y., AlRegib G., Kira Z. Action segmentation with joint self-supervised temporal domain adaptation. Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 9454–9463. https://doi.org/10.1109/cvpr42600.2020.00947
  15. Madrid G.K.R., Villanueva R.G.R., Caya M.V.C. Recognition of dynamic Filipino Sign language using MediaPipe and long short-term memory. Proc. of the 2022 13th International Conference on Computing Communication and Networking Technologies (ICCCNT), 2022. https://doi.org/10.1109/ICCCNT54827.2022.9984599
  16. Adhikary S., Talukdar A.K., Sarma K.K. A vision-based system for recognition of words used in Indian Sign Language using MediaPipe. Proc. of the 2021 Sixth International Conference on Image Information Processing (ICIIP), 2021, pp. 390–394. https://doi.org/10.1109/ICIIP53038.2021.9702551
  17. Zhang S., Chen W., Chen C., Liu Y. Human deep squat detection method based on MediaPipe combined with Yolov5 network. Proc. of the 2022 41st Chinese Control Conference (CCC), 2022, pp. 6404–6409. https://doi.org/10.23919/CCC55666.2022.9902631
  18. Quiñonez Y., Lizarraga C., Aguayo R. Machine learning solutions with MediaPipe. Proc. of the 2022 11th International Conference On Software Process Improvement (CIMPS), 2022, pp. 212–215. https://doi.org/10.1109/CIMPS57786.2022.10035706
  19. Ma J., Ma L., Ruan W., Chen H., Feng J. A Wushu posture recognition system based on MediaPipe. Proc. of the 2022 2nd International Conference on Information Technology and Contemporary Sports (TCS), 2022, pp. 10–13. https://doi.org/10.1109/TCS56119.2022.9918744
  20. Nguyen D.Q., Vu T., Nguyen D.Q., Dras M., Johnson M. 2017. From word segmentation to POS tagging for Vietnamese. Proc. of the 15th Australasian Language Technology Association Workshop, 2012, pp. 108–113.


Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License
Copyright 2001-2024 ©
Scientific and Technical Journal
of Information Technologies, Mechanics and Optics.
All rights reserved.

Яндекс.Метрика