BILINGUAL MULTIMODAL SYSTEM FOR TEXT-TO-AUDIOVISUAL SPEECH AND SIGN LANGUAGE SYNTHESIS

A. A. Karpov, M. Zelezny


Read the full article 

Abstract

We present a conceptual model, architecture and software of a multimodal system for audio-visual speech and sign language synthesis by the input text. The main components of the developed multimodal synthesis system (signing avatar) are: automatic text processor for input text analysis; simulation 3D model of human's head; computer text-to-speech synthesizer; a system for audio-visual speech synthesis; simulation 3D model of human’s hands and upper body; multimodal user interface integrating all the components for generation of audio, visual and signed speech. The proposed system performs automatic translation of input textual information into speech (audio information) and gestures (video information), information fusion and its output in the form of multimedia information. A user can input any grammatically correct text in Russian or Czech languages to the system; it is analyzed by the text processor to detect sentences, words and characters. Then this textual information is converted into symbols of the sign language notation. We apply international «Hamburg Notation System» - HamNoSys, which describes the main differential features of each manual sign: hand shape, hand orientation, place and type of movement. On their basis the 3D signing avatar displays the elements of the sign language. The virtual 3D model of human’s head and upper body has been created using VRML virtual reality modeling language, and it is controlled by the software based on OpenGL graphical library. The developed multimodal synthesis system is a universal one since it is oriented for both regular users and disabled people (in particular, for the hard-of-hearing and visually impaired), and it serves for multimedia output (by audio and visual modalities) of input textual information.


Keywords: multimodal user interfaces, human-computer interaction, sign language, speech synthesis, 3D models, assistive technologies, signing avatar

Acknowledgements. Исследование выполнено при частичной финансовой поддержке Правительства Российской Федерации (грант № 074-U01), фонда РФФИ (проект № 12-08-01265_а) и Европейского фонда регионального развития (ЕФРР), проект «Новые технологии для информационного общества» (NTIS), Европейский центр передового опыта, ED1.1.00/02.0090.

References
1. Karpov A., Krnoul Z., Zelezny M., Ronzhin A. Multimodal synthesizer for Russian and Czech sign languages
and audio-visual speech // Lecture Notes in Computer Science (including subseries Lecture Notes in
Artificial Intelligence and Lecture Notes in Bioinformatics). 2013. V. 8009 LNCS. Part 1. P. 520–529.
2. Hanke T. HamNoSys – representing sign language data in language resources and language processing contexts
// Proc. International Conference on Language Resources and Evaluation, LREC 2004. Lisbon, Portugal,
2004. P. 1–6.
3. Карпов А.А., Кагиров И.А. Формализация лексикона системы компьютерного синтеза языка жестов //
Труды СПИИРАН. 2011. № 1 (16). С. 123–140.
4. Efthimiou E. et al. Sign language technologies and resources of the dicta-sign project // Proc. 5th Workshop
on the Representation and Processing of Sign Languages. Istanbul, Turkey, 2012. P. 37–44.
5. Caminero J., Rodríguez-Gancedo M., Hernández-Trapote A., López-Mencía B. SIGNSPEAK project tools: a
way to improve the communication bridge between signer and hearing communities // Proc. 5th Workshop on
the Representation and Processing of Sign Languages. Istanbul, Turkey, 2012. P. 1–6.
6. Gibet S., Courty N., Duarte K., Naour T. The SignCom system for data-driven animation of interactive virtual
signers: methodology and evaluation // ACM Transactions on Interactive Intelligent Systems. 2011. V. 1.
N 1. Art. 6.
7. Borgotallo R., Marino C., Piccolo E. et.al. A multi-language database for supporting sign language translation
and synthesis // Proc. 4th Workshop on the Representation and Processing of Sign Languages: Corpora
and Sign Language Technologies. Malta, 2010. P. 23–26.
8. Карпов А.А. Компьютерный анализ и синтез русского жестового языка // Вопросы языкознания. 2011.
№ 6. С. 41–53.
9. Železný M., Krňoul Z., Císař P., Matoušek J. Design, implementation and evaluation of the Czech realistic
audio-visual speech synthesis // Signal Processing. 2006. V. 86. N 12. P. 3657–3673.
10. Tihelka D., Kala J., Matoušek J. Enhancements of viterbi search for fast unit selection synthesis // Proc. 11th
Annual Conference of the International Speech Communication Association, INTERSPEECH-2010.
Makuhari, Japan, 2010. P. 174–177.
11. Hoffmann R., Jokisch O., Lobanov B., Tsirulnik L., Shpilewsky E., Piurkowska B., Ronzhin A., Karpov A.
Slavonic TTS and SST conversion for let's fly dialogue system // Proc. 12th International Conference on
Speech and Computer SPECOM-2007. Moscow, Russia, 2007. P. 729–733.
12. Krňoul Z., Železný M., Müller L. Training of coarticulation models using dominance functions and visual
unit selection methods for audio-visual speech synthesis // Proc. Annual Conference of the International
Speech Communication Association INTERSPEECH. Pittsburgh, USA, 2006. V. 2. P. 585–588.
13. Karpov A., Tsirulnik L., Krňoul Z., Ronzhin A., Lobanov B., Železný M. Audio-visual speech asynchrony
modeling in a talking head // Proc. Annual Conference of the International Speech Communication Association
INTERSPEECH. Brighton, UK, 2009. P. 2911–2914.
14. Krňoul Z., Železný M. Translation and conversion for Czech sign speech synthesis // Lecture Notes in Computer
Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics).
2007. P. 524–531.
15. Krňoul Z., Kanis J., Železný M., Müller L. Czech text-to-sign speech synthesizer // Lecture Notes in Computer
Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics).
2008. V. 4892 LNCS. P. 180–191.
16. Карпов А.А. Машинный синтез русской дактильной речи по тексту // Научно-техническая информа-
ция. Серия 2: Информационные процессы и системы. 2013. № 1. С. 20–26.
17. Карпов А.А., Цирульник Л.И., Железны М. Разработка компьютерной системы «говорящая голова»
для аудиовизуального синтеза русской речи по тексту // Информационные технологии. 2010. № 8.
С. 13–18.
18. Borgia F., Bianchini C.S., De Marsico M. Towards improving the e-learning experience for deaf students: e-
LUX // Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and
Lecture Notes in Bioinformatics). 2014. V. 8514 LNCS. Part 2. P. 221–232.
19. Тампель И.Б., Краснова Е.В., Панова Е.А., Левин К.Е., Петрова О.С. Использование информационно-
коммуникационных технологий в электронном обучении иностранным языкам // Научно-технический
вестник информационных технологий, механики и оптики. 2013. № 2 (84). С. 154–160.
20. Hruz M., Campr P., Dikici E. et. al. Automatic fingersign to speech translation system // Journal on Multimodal
User Interfaces. 2011. V. 4. N 2. P. 61–79.
21. Karpov A., Ronzhin A. A universal assistive technology with multimodal input and multimedia output interfaces
// Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and
Lecture Notes in Bioinformatics). 2014. V. 8513 LNCS. Part 1. P. 369–378.
22. Карпов А.А. ICanDo: Интеллектуальный помощник для пользователей с ограниченными физически-
ми возможностями // Вестник компьютерных и информационных технологий. 2007. № 7. С. 32–41.
23. Karpov A., Ronzhin A., Kipyatkova I. An assistive bi-modal user interface integrating multi-channel speech
recognition and computer vision // Lecture Notes in Computer Science (including subseries Lecture Notes in
Artificial Intelligence and Lecture Notes in Bioinformatics). 2011. V. 6762. Part 2. P. 454–463.
24. Karpov A., Ronzhin A., Markov K., Zelezny M. Viseme-dependent weight optimization for CHMM-based
audio-visual speech recognition // Proc. 11th Annual Conference of the International Speech Communication
Association, INTERSPEECH 2010. Makuhari, Japan, 2010. P. 2678–2681.
25. Kindiroglu A., Yalcın H., Aran O., Hruz M., Campr P., Akarun L., Karpov A. Automatic recognition of fingerspelling
gestures in multiple languages for a communication interface for the disabled // Pattern Recognition
and Image Analysis. 2012. V. 22. N 4. P. 527–536.
26. Карпов А.A., Акарун Л., Ронжин Ал.Л. Многомодальные ассистивные системы для интеллектуально-
го жилого пространства // Труды СПИИРАН. 2011. № 4 (19). С. 48–64.
Copyright 2001-2017 ©
Scientific and Technical Journal
of Information Technologies, Mechanics and Optics.
All rights reserved.

Яндекс.Метрика