Keywords: computational paralinguistics, speech technology, acoustical analysis, emotion recognition, machine learning, speaker states, acoustical paralinguistic events
Acknowledgements. This research is financially supported by the Russian Foundation for Basic Research (project No. 16-37-60100) and by the Council for Grants of the President of Russia (project No. MD-3035.2015.8)
References
1. Basov O.O., Karpov A.A., Saitov I.A. Metodologicheskie Osnovy Sinteza Polimodal'nykh Infokommunikatsionnykh Sistem Gosudarstvennogo Upravleniya [Methodological Bases of Synthesis of Multimodal Infocommunication Governance Systems]. Orel, Russian Academy of SSF, 2015, 271 p.
2. Schuller B. Voice and speech analysis in search of states and traits. In: Computer Analysis of Human Behavior. Eds. A.A. Salah, T. Gevers. Springer,2011,pp. 227–253.doi: 10.1007/978-0-85729-994-9_9
3. Schuller B., Rigoll G., Lang M. Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine - Belief network architecture. Proc. IEEE Int. Conf. on Acoustic, Speech and Signal Processing, ICASSP-2004. Montreal, Canada, 2004, pp. 577–580.
4. Schuller B., Vlasenko B., Eyben F., Wollmer M., Stuhlsatz A., Wendemuth A., Rigoll G. Cross-corpus acoustic emotion recognition: variances and strategies. IEEE Transactions on Affective Computing, 2010, vol. 1, no. 2, pp. 119–131. doi: 10.1109/T-AFFC.2010.8
5. El Ayadi M., Kamel M.S., Karray F. Survey on speech emotion recognition: features, classification schemes, and databases. PatternRecognition,2011, vol. 44, no. 3,pp. 572–587.doi: 10.1016/j.patcog.2010.09.020
6. Dhall A., Goecke R., Lucey S., Gedeon T. Collecting large, richly annotated facial-expression databases from movies. IEEEMultiMedia,2012, vol. 19,no.3,pp. 34–41.doi: 10.1109/MMUL.2012.26
7. Makarova V., Petrushin V. RUSLANA: a database of Russian emotional utterances. Proc. ICSLP-2002. Denver,USA,2002, pp. 2041–2044.
8. Burkhardt F., Paeschke A., Rolfes M., Sendlmeier W., Weiss B. A database of German emotional speech. Proc. 9th European Conf. on Speech Communication and Technology. Lisbon, Portugal, 2005, pp. 1517–1520.
9. Kaya H., Salah A.A., Gurgen S.F., Ekenel H. Protocol and baseline for experiments on Bogazici University Turkish emotional speech corpus. Proc. 22nd Signal Processing and Communications Applications Conf. Trabzon, Turkey, 2014, pp. 1698–1701. doi: 10.1109/SIU.2014.6830575
10. Schuller B., Steidl S., Batliner A., Vinciarelli A., Scherer K., Ringeval F., Chetouani M., Weninger F., Eyben F., Marchi E., Mortillaro M., Salamin H., Polychroniou A., Valente F., Kim S. The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism. Proc. INTERSPEECH-2013. Lyon, France, 2013, pp. 148–152.
11. Eyben F., Weninger F., Groß F., Schuller B. Recent developments in OpenSMILE, the Munich open-source multimedia feature extractor. Proc. 21st ACM Int. Conf. on Multimedia. Barcelona, Spain, 2013, pp. 835–838. doi: 10.1145/2502081.2502224
12. Bozkurt E., Erzin E., Erdem C.E., Erdem A.T. Formant position based weighted spectral features for emotion recognition. SpeechCommunication,2011, vol. 53,no.9–10,pp. 1186–1197.doi: 10.1016/j.specom.2011.04.003
13. Alpaydin E. Introduction to Machine Learning. 2nd ed. MIT Press, 2010, 581 p.
14. Kaya H., Salah A.A. Combining modality-specific extreme learning machines for emotion recognition in the wild. Proc. 16th Int. Conf. on Multimodal Interaction ICMI-2014. Istanbul, Turkey, 2014, pp. 487–493. doi: 10.1145/2663204.2666273
15. Schuller B., Villar R.J., Rigoll G., Lang M.K. Meta-classifiers in acoustic and linguistic feature fusion-based affect recognition. Proc. IEEE Int. Conf. ICASSP-2005. Philadelphia, USA, 2005, pp. 325–328. doi: 10.1109/ICASSP.2005.1415116
16. Schuller B., Steidl S., Batliner A. The INTERSEECH 2009 emotion challenge.Proc. INTERSEECH-2009. Brighton, UK, 2009, pp. 312–315.
17. Lee C.-C., Mower E., Busso C., Lee S., Narayanan S. Emotion recognition using a hierarchical binary decision tree approach. Proc. INTERSPEECH-2009. Brighton, UK, 2009, pp. 320–323.
18. Dumouchel P., Dehak N., Attabi Y., Dehak R., Boufaden N. Cepstral and long-term features for emotion recognition. Proc. INTERSEECH-2009. Brighton, UK, 2009, pp. 344–347.
19. Schuller B., Steidl S., Batliner A., Burkhardt F., Devillers L., Mueller C., Narayanan S. The INTERSEECH 2010 paralinguistic challenge. Proc. INTERSPEECH-2010.Makuhari, Japan, 2010, pp. 2794–2797.
20. Kockmann M., Burget L., Cernocky J. Brno University of Technology system for INTERSPEECH 2010 paralinguistic challenge. Proc. INTERSEECH-2010. Makuhari, Japan, 2010, pp. 2822–2825.
21. Meinedo H., Trancoso I. Age and gender classification using fusion of acoustic and prosodic features. Proc. INTERSEECH-2010. Makuhari, Japan, 2010, pp. 2818–2821.
22. Jeon J.H., Xia R., Liu Y. Level of interest sensing in spoken dialog using multi-level fusion of acoustic and lexical evidence. Proc. INTERSEECH-2010. Makuhari, Japan, 2010, pp. 2802–2805
23. Schuller B., Steidl S., Batliner A., Schiel F., Krajewski J. The INTERSPEECH 2011 speaker state challenge. Proc. INTERSEECH-2011. Florence, Italy, 2011, pp. 3201–3204.
24. Bone D., Black M.P., Li M., Metallinou A., Lee S., Narayanan S.S. Intoxicated speech detection by fusion of speaker normalized Hierarchical features and GMM supervectors. Proc. INTERSEECH-2011. Florence, Italy, 2011, pp. 3217–3220.
25. Huang D.Y., Ge S.S., Zhang Z. Speaker state classification based on fusion of asymmetric SIMPLS and support vector machines. Proc. INTERSPEECH-2011. Florence, Italy, 2011, pp. 3301–3304.
26. Schuller B., Steidl S., Batliner A., Nöth E., Vinciarelli A., Burkhardt F., van Son R., Weninger F., Eyben F., Bocklet T., Mohammadi G., Weiss B. The INTERSPEECH 2012 speaker trait challenge. Proc. INTERSPEECH-2012. Portland, USA, 2012, pp. 254–257.
27. Ivanov A., Chen X. Modulation spectrum analysis for speaker personality trait recognition. Proc. INTERSPEECH-2012. Portland, USA, 2012, pp. 278–281.
28. Montacie C., Caraty M.-J. Pitch and intonation contribution to speakers’ traits classification. Proc. INTERSPEECH-2012. Portland, USA, 2012, pp. 526–529.
29. Kim J., Kumar N., Tsiartas A., Li M., Narayanan S. Intelligibility classification of pathological speech using fusion of multiple subsystems. Proc. INTERSPEECH-2012. Portland, USA, 2012, pp. 534–537.
30. Anumanchipalli G.K., Meinedo H., Bugalho M., Trancoso I., Oliveira L.C., Black A.W. Text-dependent pathological voice detection. Proc. INTERSPEECH-2012. Portland, USA, 2012, pp. 530–533.
31. Brueckner R., Schuller B. Likability classification - a not so deep neural network approach. Proc. INTERSPEECH-2012. Portland, USA, 2012, pp. 290–293.
32. Buisman H., Postma E. The log-Gabor method: speech classification using spectrogram image analysis. Proc. INTERSPEECH-2012. Portland, USA, 2012, pp. 518–521.
33. Lu D., Sha F. Predicting likability of speakers with Gaussian processes. Proc. INTERSPEECH-2012. Portland, USA, 2012, pp. 286–289.
34. Huang D.-Y., Zhu Y., Wu D., Yu R. Detecting intelligibility by linear dimensionality reduction and normalized voice quality hierarchical features. Proc. INTERSPEECH-2012. Portland, USA, 2012, pp. 546–549.
35. Zhang Z., Coutinho E., Deng J., Schuller B. Cooperative learning and its application to emotion recognition from speech. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2015, vol. 23, no. 1, pp. 115–126.
36. Asgari M., Bayestehtashk A., Shafran I. Robust and accurate features for detecting and diagnosing autism spectrum disorders. Proc. INTERSEECH-2013. Lyon, France, 2013, pp. 191–194.
37. Rasanen O., Pohjalainen J. Random subset feature selection in automatic recognition of developmental disorders, affective states, and level of conflict from speech. Proc. INTERSPEECH-2013.Lyon, France,2013, pp. 210–214.
38. Gosztolya G., Busa-Fekete R., Toth L. Detecting autism, emotions and social signals using Adaboost. Proc. INTERSPEECH-2013.Lyon, France,2013, pp. 220–224.
39. Gupta R., Audhkhasi K., Lee S., Narayanan S. Paralinguistic event detection from speech using probabilistic time-series smoothing and masking. Proc. INTERSPEECH-2013. Lyon, France, 2013, pp. 173–177.
40. Kaya H., Ozkaptan T., Salah A.A., Gürgen F. Random discriminative projection based feature selection with application to conflict recognition. IEEE Signal Processing Letters, 2015, vol. 22, no. 6, pp. 671–675. doi: 10.1109/LSP.2014.2365393
41. Martinez D., Ribas D., Lleida E., Ortega A., Miguel A. Suprasegmental information modelling for autism disorder spectrum and specific language impairment classification. Proc. INTERSPEECH-2013. Lyon, France, 2013, pp. 195–199.
42. Lee H.-Y., Hu T.-Y., Jing H., Chang Y.-F., Tsao Y., Kao Y.-C., Pao T.-L. Ensemble of machine learning and acoustic segment model techniques for speech emotion and autism spectrum disorders recognition. Proc. INTERSEECH-2013. Lyon, France, 2013, pp. 215–219.
43. Grezes F., Richards J., Rosenberg A. Let me finish: automatic conflict detection using speaker overlap. Proc. INTERSPEECH-2013. Lyon, France, 2013, pp. 200–204.
44. Sethu V., Epps J., Ambikairajah E., Li H. GMM based speaker variability compensated system for interspeech 2013 compare emotion challenge. Proc. INTERSEECH-2013. Lyon, France, 2013, pp. 205–209.
45. Janicki A. Non-linguistic vocalisation recognition based on hybrid GMM-SVM approach. Proc. INTERSPEECH-2013. Lyon, France, 2013, pp. 153–157.
46. Schuller B., Steidl S., Batliner A., Epps J., Eyben F., Ringeval F., Marchi E., Zhang Y. The INTERSPEECH 2014 computational paralinguistics challenge: cognitive & physical load. Proc. INTERSPEECH-2014. Singapore, 2014, pp. 427–431.
47. Kaya H., Ozkaptan T., Salah A.A., Gurgen S.F. Canonical correlation analysis and local Fisher discriminant analysis based multi-view acoustic feature reduction for physical load prediction. Proc. INTERSPEECH-2014. Singapore, 2014, pp. 442–446.
48. Van Segbroeck M., Travadi R., Vaz C., Kim J., Black M.P., Potamianos A., Narayanan S. Classification of cognitive load from speech using an i-vector framework. Proc. INTERSPEECH-2014. Singapore, 2014, pp. 751–755.
49. Kaya H., Eyben F., Salah A.A., Schuller B.W. CCA based feature selection with application to continuous depression recognition from acoustic speech features. Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP-2014. Florence, Italy, 2014, pp. 3729–3733.
50. Kua J., Sethu V., Le P., Ambikairajah E. The UNSW submission to INTERSPEECH 2014 compare cognitive load challenge. Proc. INTERSPEECH-2014. Singapore, 2014, pp. 746–750.
51. Gosztolya G., Grosz T., Busa-Fekete R., Toth L. Detecting the intensity of cognitive and physical load using AdaBoost and deep rectifier neural networks. Proc. INTERSPEECH-2014. Singapore, 2014, pp. 452–456.
52. Schuller B., Steidl S., Batliner A., Hantke S., Honig F., Orozco-Arroyave J.R., Noth E., Zhang Y., Weninger F. The INTERSEECH 2015 computational paralinguistics challenge: nativeness, Parkinson’s & eating condition. Proc. INTERSPEECH-2015. Dresden, Germany, 2015, pp. 478–482.
53. Black M., Bone D., Skordilis Z., Gupta R., Xia W., Papadopoulos P., Chakravarthula S., Xiao B., Segbroeck M., Kim J., Georgiou P., Narayanan S. Automated evaluation of non-native English pronunciation quality: combining knowledge- and data-driven features at multiple time scales. Proc. INTERSPEECH-2015. Dresden, Germany, 2015, pp. 493–497.
54. Grosz T., Busa-Fekete R., Gosztolya G., Toth L. Assessing the degree of nativeness and Parkinson's condition using Gaussian processes and deep rectifier neural networks. Proc. INTERSEECH-2015. Dresden, Germany, 2015, pp. 919–923.
55. Kaya H., Karpov A., Salah A. Fisher vectors with cascaded normalization for paralinguistic analysis. Proc. INTERSPEECH-2015. Dresden, Germany, 2015, pp. 909–913.
56. Ribeiro E., Ferreira J., Olcoz J., Abad A., Moniz H., Batista F., Trancoso I. Combining multiple approaches to predict the degree of nativeness. Proc. INTERSPEECH-2015. Dresden, Germany, 2015, pp. 488–492.
57. Kim J., Nasir M., Gupta R., Segbroeck M., Bone D., Black M., Skordilis Z., Yang Z., Georgiou P., Narayanan S. Automatic estimation of parkinson's disease severity from diverse speech tasks. Proc. INTERSEECH-2015.Dresden, Germany,2015,pp. 914–918.
58. Milde B., Biemann C. Using representation learning and out-of-domain data for a paralinguistic speech task. Proc. INTERSPEECH-2015.Dresden, Germany,2015,pp. 904–908.
59. Hahm S., WangJ. Parkinson's condition estimation using speech acoustic and inversely mapped articulatory data. Proc. INTERSPEECH-2015.Dresden, Germany,2015,pp.513–517.
60. Hantke S., Weninger F., Kurle R., Ringeval F., Batliner A., El-Desoky Mousa A., Schuller B. I hear you eat and speak: automatic recognition of eating condition and food type, use-cases, and impact on ASR performance. PLoS ONE,2016,vol. 11(5).doi:10.1371/journal.pone.0154486
61. Kaya H., Karpov A., Salah A.A. Robust acoustic emotion recognition based on cascaded normalization and extreme learning machines. Lecture Notes in Computer Science, 2016, vol. 9719. doi:10.1007/978-3-319-40663-3_14
62. Lyakso E., Frolova O., Dmitrieva E., Grigorev A., Kaya H., Salah A.A., Karpov A. EmoChildRu: emotional child Russian speech corpus. Lecture Notes in Computer Science, 2015,vol. 9319,pp. 144–152.doi: 10.1007/978-3-319-23132-7_18
63. Schuller B., Steidl S., Batliner A., Hirschberg J., Burgoon J.K., Baird A., Elkins A., Zhang Y., Coutinho E., Evanini K. The INTERSPEECH 2016 computational paralinguistics challenge: deception, sincerity & native language. Proc.INTERSPEECH-2016. San Francisco, USA, 2016.
64. Kaya H., Karpov A. Fusing acoustic feature representations for computational paralinguistics tasks.Proc.INTERSPEECH-2016.SanFrancisco, USA,2016.