DOI: 10.17586/2226-1494-2016-16-3-387-401


D. V. Ivanko , I. S. Kipyatkova, A. L. Ronzhin, A. A. Karpov

Read the full article 
Article in Russian

For citation: Ivanko D.V., Kipyatkova I.S., Ronzhin A.L., Karpov A.A. Analysis of multimodal fusion techniques for audio-visual speech recognition. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2016, vol. 16, no. 3, pp. 387–401. doi: 10.17586/2226-1494-2016-16-3-387-401


The paper deals with analytical review, covering the latest achievements in the field of audio-visual (AV) fusion (integration) of multimodal information. We discuss the main challenges and report on approaches to address them. One of the most important tasks of the AV integration is to understand how the modalities interact and influence each other. The paper addresses this problem in the context of AV speech processing and speech recognition. In the first part of the review we set out the basic principles of AV speech recognition and give the classification of audio and visual features of speech. Special attention is paid to the systematization of the existing techniques and the AV data fusion methods. In the second part we provide a consolidated list of tasks and applications that use the AV fusion based on carried out analysis of research area. We also indicate used methods, techniques, audio and video features. We propose classification of the AV integration, and discuss the advantages and disadvantages of different approaches. We draw conclusions and offer our assessment of the future in the field of AV fusion. In the further research we plan to implement a system of audio-visual Russian continuous speech recognition using advanced methods of multimodal fusion.

Keywords: audio-visual integration, audio-visual speech recognition, multimodal analysis, multimodal fusion, deep learning

Acknowledgements. The research is financially supported by the Russian Foundation for Basic Research (projects No. 15-07-04415-a and 15-07-04322-а) and by the Council for Grants of the President of Russia (projects No. MD-3035.2015.8 and МК-5209.2015.8).


