DOI: 10.17586/2226-1494-2016-16-4-581-592


A. A. Karpov, H. Kaya, A. A. Salah

For citation: Karpov A.A., Kaya H., Salah A.A. State-of-the-art tasks and achievements of paralinguistic speech analysis systems. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2016, vol. 16, no. 4, pp. 581–592. doi: 10.17586/2226-1494-2016-16-4-581-592


We present analytical survey of state-of-the-art actual tasks in the area of computational paralinguistics, as well as the recent achievements of automatic systems for paralinguistic analysis of conversational speech. Paralinguistics studies non-verbal aspects of human communication and speech such as: natural emotions, accents, psycho-physiological states, pronunciation features, speaker’s voice parameters, etc. We describe architecture of a baseline computer system for acoustical paralinguistic analysis, its main components and useful speech processing methods. We present some information on an International contest called Computational Paralinguistics Challenge (ComParE), which is held each year since 2009 in the framework of the International conference INTERSPEECH organized by the International Speech Communication Association. We present sub-challenges (tasks) that were proposed at the ComParE Challenges in 2009-2016, and analyze winning computer systems for each sub-challenge and obtained results. The last completed ComParE-2015 Challenge was organized in September 2015 in Germany and proposed 3 sub-challenges: 1) Degree of Nativeness (DN) sub-challenge, determination of nativeness degree of speakers based on acoustics; 2) Parkinson's Condition (PC) sub-challenge, recognition of a degree of Parkinson’s condition based on speech analysis; 3) Eating Condition (EC) sub-challenge, determination of the eating condition state during speaking or a dialogue, and classification of consumed food type (one of seven classes of food) by the speaker. In the last sub-challenge (EC), the winner was a joint Turkish-Russian team consisting of the authors of the given paper. We have developed the most efficient computer-based system for detection and classification of the corresponding (EC) acoustical paralinguistic events. The paper deals with the architecture of this system, its main modules and methods, as well as the description of used training and evaluation audio data and the best obtained results on machine classification of these acoustic paralinguistic events.

Keywords: computational paralinguistics, speech technology, acoustical analysis, emotion recognition, machine learning, speaker states, acoustical paralinguistic events

Acknowledgements. This research is financially supported by the Russian Foundation for Basic Research (project No. 16-37-60100) and by the Council for Grants of the President of Russia (project No. MD-3035.2015.8)

