BILINGUAL MULTIMODAL SYSTEM FOR TEXT-TO-AUDIOVISUAL SPEECH AND SIGN LANGUAGE SYNTHESIS

Karpov Alexey A, Zelezny Milos

2014 , VOLUME 14, NUMBER 5 ( SEPTEMBER-OCTOBER )

ISSN 2226-1494 (print), ISSN 2500-0373 (online)

Publications

Editor-in-Chief

Nikiforov
Vladimir O.
D.Sc., Prof.

Partners

BILINGUAL MULTIMODAL SYSTEM FOR TEXT-TO-AUDIOVISUAL SPEECH AND SIGN LANGUAGE SYNTHESIS

A. A. Karpov, M. Zelezny

Read the full article

Article in Russian

Abstract

We present a conceptual model, architecture and software of a multimodal system for audio-visual speech and sign language synthesis by the input text. The main components of the developed multimodal synthesis system (signing avatar) are: automatic text processor for input text analysis; simulation 3D model of human's head; computer text-to-speech synthesizer; a system for audio-visual speech synthesis; simulation 3D model of human’s hands and upper body; multimodal user interface integrating all the components for generation of audio, visual and signed speech. The proposed system performs automatic translation of input textual information into speech (audio information) and gestures (video information), information fusion and its output in the form of multimedia information. A user can input any grammatically correct text in Russian or Czech languages to the system; it is analyzed by the text processor to detect sentences, words and characters. Then this textual information is converted into symbols of the sign language notation. We apply international «Hamburg Notation System» - HamNoSys, which describes the main differential features of each manual sign: hand shape, hand orientation, place and type of movement. On their basis the 3D signing avatar displays the elements of the sign language. The virtual 3D model of human’s head and upper body has been created using VRML virtual reality modeling language, and it is controlled by the software based on OpenGL graphical library. The developed multimodal synthesis system is a universal one since it is oriented for both regular users and disabled people (in particular, for the hard-of-hearing and visually impaired), and it serves for multimedia output (by audio and visual modalities) of input textual information.

Keywords: multimodal user interfaces, human-computer interaction, sign language, speech synthesis, 3D models, assistive technologies, signing avatar

Acknowledgements. Исследование выполнено при частичной финансовой поддержке Правительства Российской Федерации (грант № 074-U01), фонда РФФИ (проект № 12-08-01265_а) и Европейского фонда регионального развития (ЕФРР), проект «Новые технологии для информационного общества» (NTIS), Европейский центр передового опыта, ED1.1.00/02.0090.

References

1. Karpov A., Krnoul Z., Zelezny M., Ronzhin A. Multimodal synthesizer for Russian and Czech sign languages and audio-visual speech. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2013, vol. 8009 LNCS, part 1, pp. 520–529. doi: 10.1007/978-3-642-39188-0-56

2. HankeT. HamNoSys –representing sign language data in language resources and language processing contexts. Proc. International Conference on Language Resources and Evaluation, LREC 2004. Lisbon, Portugal, 2004, pp. 1–6.

3. Karpov A.A., Kagirov I.A. Formalizatsiya leksikona sistemy komp'yuternogo sinteza yazyka zhestov [Lexicon formalization for a computer system of sign language synthesis]. SPIIRAS Proceedings, 2011, no. 1 (16), pp. 123–140.

4. Efthimiou E. et al. Sign language technologies and resources of the dicta-sign project.Proc. 5^th Workshop on the Representation and Processing of Sign Languages. Istanbul, Turkey, 2012, pp. 37–44.

5. Caminero J., Rodríguez-Gancedo M., Hernández-Trapote A., López-Mencía B. SIGNSPEAK project tools: a way to improve the communication bridge between signer and hearing communities. Proc. 5^th Workshop on the Representation and Processing of Sign Languages. Istanbul, Turkey, 2012, pp. 1–6.

6. Gibet S., Courty N., Duarte K., Naour T. The SignCom system for data-driven animation of interactive virtual signers: methodology and evaluation. ACM Transactions on Interactive Intelligent Systems, 2011, vol. 1, no. 1, art. 6. doi: 10.1145/2030365.2030371

7. Borgotallo R., Marino C., Piccolo E., Prinetto P., Tiotto G., Rossini M. A multi-language database for supporting sign language translation and synthesis. Proc. 4^th Workshop on the Representation and Processing of Sign Languages: Corpora and Sign Language Technologies. Malta, 2010, pp. 23–26.

8. Karpov A.A. Komp'yuternyi analiz i sintez russkogo zhestovogo yazyka [Computer analysis and synthesis of Russian sign language]. Voprosy Yazykoznaniya,2011, no.6, pp. 41–53.

9. Železný M., Krňoul Z., Císař P., Matoušek J. Design, implementation and evaluation of the Czech realistic audio-visual speech synthesis. Signal Processing, 2006, vol. 86,no. 12, pp. 3657–3673. doi: 10.1016/j.sigpro.2006.02.039

10.Tihelka D., Kala J., Matoušek J. Enhancements of viterbi search for fast unit selection synthesis. Proc. 11^th Annual Conference of the International Speech Communication Association, INTERSPEECH-2010. Makuhari, Japan, 2010, pp. 174–177.

11.Hoffmann R., Jokisch O., Lobanov B., Tsirulnik L., Shpilewsky E., Piurkowska B., Ronzhin A., Karpov A. Slavonic TTS and SST conversion for let's fly dialogue system. Proc. 12^th International Conference on Speech and Computer SPECOM-2007. Moscow, Russia,2007, pp. 729–733.

12.Krňoul Z., Železný M., MüllerL. Training of coarticulation models using dominance functions and visual unit selection methods for audio-visual speech synthesis. Proc. Annual Conference of the International Speech Communication Association, INTERSPEECH. Pittsburgh, USA, 2006, vol. 2, pp. 585–588.

13.Karpov A., Tsirulnik L., Krňoul Z., Ronzhin A., Lobanov B., Železný M. Audio-visual speech asynchrony modeling in a talking head. Proc. Annual Conference of the International Speech Communication Association INTERSPEECH.Brighton, UK,2009,pp. 2911–2914.

14.Krňoul Z., Železný M. Translation and conversion for Czechsign speech synthesis. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2007, pp. 524–531.

15.Krňoul Z., Kanis J., Železný M., Müller L. Czech text-to-sign speech synthesizer. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2008, vol. 4892 LNCS, pp. 180–191. doi: 10.1007/978-3-540-78155-4_16

16.Karpov A.A. Mashinnyi sintez russkoi daktil'noi rechi po tekstu [Computer synthesis Russian finger spelling by text]. Nauchno-Tekhnicheskaya Informatsiya. Seriya 2: Informatsionnye Protsessy i Sistemy, 2013, no. 1, pp. 20–26.

17.Karpov A. A., Tsirulnik L. I., Zelezny M. Razrabotka komp'yuternoi sistemy “govoryashchaya golova” dlya audiovizual'nogo sinteza russkoi rechi po tekstu [Development of a computer system "Talking Head" for text-to-audiovisual-speech synthesis]. Informatsionnye Tekhnologii, 2010, no. 8, pp. 13–18.

18.Borgia F., Bianchini C.S., De Marsico M. Towards improving the e-learning experience for deaf students: e-LUX. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2014, vol. 8514 LNCS, part 2, pp. 221–232. doi: 10.1007/978-3-319-07440-5_21

19.Tampel I.B., Krasnova E.V., Panova E.A., Levin K.E., Petrova O.S. Ispol'zovanie informatsionno-kommunikatsionnykh tekhnologii v elektronnom obuchenii inostrannym yazykam [Application of information and communication technologies in computer aided language learning]. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2013, no. 2 (84), pp. 154–160.

20.Hruz M., Campr P., Dikici E., Kindiroǧlu A.A., Krňoul Z., Ronzhin A., Sak H., Schorno D., Yalçin H., Akarun L., Aran O., Karpov A., Saraçlar M., Železný M. Automatic fingersign to speech translation system. Journal on Multimodal User Interfaces, 2011, vol. 4, no. 2, pp. 61–79. doi: 10.1007/s12193-011-0059-3

21.Karpov A., Ronzhin A. A universal assistive technology with multimodal input and multimedia output interfaces. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2014, vol. 8513 LNCS, part 1, pp. 369–378. doi: 10.1007/978-3-319-07437-5_35

22.Karpov A.A. ICanDo: Intellektual'nyi pomoshchnik dlya pol'zovatelei s ogranichennymi fizicheskimi vozmozhnostyami [ICanDo: Intelligent assistant for users with physical disabilities]. Vestnik Komp'yuternykh i Informatsionnykh Tekhnologii,2007,no. 7, pp. 32–41.

23.Karpov A., Ronzhin A., Kipyatkova I. An assistive bi-modal user interface integrating multi-channel speech recognition and computer vision. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2011, vol. 6762, part 2, pp. 454–463. doi: 10.1007/978-3-642-21605-3_50

24.Karpov A., Ronzhin A., Markov K., Zelezny M. Viseme-dependent weight optimization for CHMM-based audio-visual speech recognition. Proc. 11^th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010. Makuhari, Japan,2010, pp. 2678–2681.

25.Kindiroglu A., Yalcın H., Aran O., Hruz M., Campr P., Akarun L., Karpov A. Automatic recognition of fingerspelling gestures in multiple languages for a communication interface for the disabled. Pattern Recognition and Image Analysis, 2012, vol. 22, no. 4, pp. 527–536. doi: 10.1134/S1054661812040086

26.Karpov A.A., Akarun L., Ronzhin A.L. Mnogomodal'nye assistivnye sistemy dlya intellektual'nogo zhilogo prostranstva [Multimodal assistive systems for a smart living environment]. SPIIRAS Proceedings,2011, no. 4 (19), pp. 48–64.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License