EXTENDED SPEECH EMOTION RECOGNITION AND PREDICTION

Anagnostopoulos Theodoros , Sergey E. Khoruzhnikov, Grudinin Vladimir A., Skourlas Christos

2014 , VOLUME 14, NUMBER 6 ( NOVEMBER-DECEMBER )

ISSN 2226-1494 (print), ISSN 2500-0373 (online)

Publications

Editor-in-Chief

Nikiforov
Vladimir O.
D.Sc., Prof.

Partners

EXTENDED SPEECH EMOTION RECOGNITION AND PREDICTION

T. Anagnostopoulos, S. E. Khoruzhnikov, V. A. Grudinin, C. Skourlas

Read the full article

Article in English

Abstract

Humans are considered to reason and act rationally and that is believed to be their fundamental difference from the rest of the living entities. Furthermore, modern approaches in the science of psychology underline that humans as a thinking creatures are also sentimental and emotional organisms. There are fifteen universal extended emotions plus neutral emotion: hot anger, cold anger, panic, fear, anxiety, despair, sadness, elation, happiness, interest, boredom, shame, pride, disgust, contempt and neutral position. The scope of the current research is to understand the emotional state of a human being by capturing the speech utterances that one uses during a common conversation. It is proved that having enough acoustic evidence available the emotional state of a person can be classified by a set of majority voting classifiers. The proposed set of classifiers is based on three main classifiers: kNN, C4.5 and SVM RBF Kernel. This set achieves better performance than each basic classifier taken separately. It is compared with two other sets of classifiers: one-against-all (OAA) multiclass SVM with Hybrid kernels and the set of classifiers which consists of the following two basic classifiers: C5.0 and Neural Network. The proposed variant achieves better performance than the other two sets of classifiers. The paper deals with emotion classification by a set of majority voting classifiers that combines three certain types of basic classifiers with low computational complexity. The basic classifiers stem from different theoretical background in order to avoid bias and redundancy which gives the proposed set of classifiers the ability to generalize in the emotion domain space.

Keywords: speech emotion recognition, affective computing, machine learning

Acknowledgements. The research was carried out with the financial support of the Ministry of Education and Science of the Russian Federation under grant agreement №14.575.21.0058.

References

1. Matthews G., Zeidner M., Roberts R.D. Emotional Intelligence: Science and Myth. Cambridge, MIT Press, 2003, 697 p.

2. Schacter D.L. Psychology. 2^nd ed. NY, Worth Publishers, 2011, 624 p.

3. GaulinS.J.C., McBurney D.H. Psychology: An Evolutionary Approach. Upper Saddle River, Prentice Hall, 2003.

4. Scherer K.R. Vocal communication of emotion: a review of research paradigms. Speech Communication, 2003, vol. 40, no. 1–2, pp. 227–256. doi: 10.1016/S0167-6393(02)00084-5

5. Thompson E.R. Development and validation of an internationally reliable short-form of the positive and negative affect schedule (PANAS). Journal of Cross-Cultural Psychology, 2007, vol. 38, no. 2, pp. 227–242. doi: 10.1177/0022022106297301

6. Parkinson B., Simons G. Worry spreads: interpersonal transfer of problem-related anxiety. Cognition and Emotion, 2012, vol. 26, no. 3, pp. 462–479. doi: 10.1080/02699931.2011.651101

7. Picard R.W. Affective Computing. Cambridge, MIT Press, 2000, 304 p.

8. Duda R.O., Hart P.E., Stork D.G. Pattern Classification. NY, John Wiley and Sons, 2000, 735 p.

9. Rong J., Chen Y.-P.P. Chowdhury M., Li G. Acoustic features extraction for emotion recognition. Proc. 6^th IEEE/ACIS International Conference on Computer and Information Science, ICIS 2007, 2007, art. 4276418, pp. 419–424. doi: 10.1109/ICIS.2007.48

10.Meng H., Pittermann J., Pittermann A., Minker W. Combined speech-emotion recognition for spoken human-computer interfaces. Proc. IEEE International Conference on Signal Processing and Communications, 2007, art. 4728535, pp. 1179–1182. doi: 10.1109/ICSPC.2007.4728535

11.Shami M.T., Kamel M.S. Segment-based approach to the recognition of emotions in speech. Proc. IEEE International Conference on Multimedia and Expo, ICME 2005, 2005, vol. 2005, art. 1521436, pp. 366–369. doi: 10.1109/ICME.2005.1521436

12.Sato N., Obuchi Y. Emotion recognition using mel-frequency cepstral coefficients. Journal of Natural Language Processing, 2007, vol. 14, no. 4, pp. 83–96. doi: 10.5715/jnlp.14.4_83

13.Grimm M., Mower E. Kroschel K., Narayanan S. Combining categorical and primitives-based emotion recognition. Proc. 14^th European Signal Processing Conference. Florence, Italy, 2006, pp. 345–357.

14.Kim S., Georgiou P.G., Lee S., Narayanan S. Real-time emotion detection system using speech: multi-modal fusion of different timescale features. Proc. 9^th IEEE International Workshop on Multimedia Signal Processing, MMSP 2007. Chania, Crete, 2007, art. 4412815, pp. 48–51. doi: 10.1109/MMSP.2007.4412815

15.Sethu V., Ambikairaja E., Epps J. Phonetic and speaker variations in automatic emotion classification. Proc. Annual Conference of the International Speech Communication Association, Interspeech. Brisbane, Australia, 2008, pp. 617–620.

16.Vlasenko B., Schuller B., Wendemuth A., Rigoll G. Frame vs. turn-level: emotion recognition from speech considering static and dynamic processing. Affective Computing and Intelligent Interaction, 2007, vol. 4738 LNCS, pp. 139–147.

17.Vondra M., Vich R. Recognition of emotions in german speech using gaussian mixture models. Multimodal Signals: Cognitive and Algorithmic Issues, 2009, vol. 5398 LNAI, pp. 256–263. doi: 10.1007/978-3-642-00525-1_26

18.Ye C., Liu J., Chen C., Song M., Bu J. Speech emotion classification on a riemannian manifold. Advances in Multimedia Information Processing – PCM 2008, 2008, vol. 5353 LNCS, pp. 61–69. doi: 10.1007/978-3-540-89796-5_7

19.GonenM., Alpaydin E. Multiple kernel learning algorithms. Journal of Machine Learning Research, 2011, vol. 12, pp. 2211–2268.

20.Bitouk D., Verma R., Nenkova A. Class-level spectral features for emotion recognition. Speech Communication, 2010, vol. 52, no. 7–8, pp. 613–625. doi: 10.1016/j.specom.2010.02.010

21.Yang N., Muraleedharan R., Kohl J., Demirkol I., Heinzelman W., Sturge-Apple M.Speech-based emotion classification using multiclass SVM with hybrid kernel and thresholding fusion. Proc. 4^th IEEE Workshop on Spoken Language Technology, SLT 2012. Miami, Florida, 2012, art. 6424267, pp. 455–460. doi: 10.1109/SLT.2012.6424267

22.Javidi M.M., Roshan E.F. Speech emotion recognition by using combinations of C5.0, neural network (NN), and support vectors machines (SVM) classification methods. Journal of Mathematics and Computer Science, 2013, vol. 6, no. 3, pp. 191–200.

23.Anagnostopoulos T., Skourlas C. Ensemble majority voting classifier for speech emotion recognition and prediction. Journal of Systems and Information Technology, 2014, vol. 16, no. 3, pp. 222–232. doi: 10.1108/JSIT-01-2014-0009

24.Ekman P. An argument for basic emotions. Cognition and Emotion, 1992, pp. 169–200.

25.Douglas-Cowie E., Cowie R., Sneddon I., Cox C., Lowry O., McRorie M., Martin J.-C., Devillers L., Abrilian S., Batliner A., Amir N., Karpouzis K. The HUMAINE database: addressing the collection and annotation of naturalistic and induced emotional data. Proc. 2^nd International Conference on Affective Computing and Intelligent Interaction, ASCII 2007. Lisbon, Portugal, 2007, vol. 4738 LNCS, pp. 488–500.

26.Jury E.I. Theory and Application of the Z-Transform Method. Malabar, Krieger Pub Co, 1973, 330 p.

27.Friedman J., Hastie T., Tibshirani R. The Elements of Statistical Learning. NY, Springer, 2001, 524 p.

28.Alpaydin E. Introduction to Machine Learning. 2^nd ed. Cambridge, MIT Press, 2010, 581 p.

29.Basu S., Dasgupta A. The mean, median, and mode of unimodal distributions: a characterization. Theory of Probability and its Applications, 1997, vol. 41, no. 2, pp. 210–223. doi: 10.1137/S0040585X97975447

30.Seymour G. Predictive Inference. NY, Chapman and Hall, 1993, 240 p.

31.Hall M., Frank E., Holmes G., Pfahringer B., Reutemann P., Witten I.H. The WEKA data mining software: an update. SIGKDD Explorations, 2009, vol. 11, no. 1, pp. 10–18. doi: 10.1145/1656274.1656278

32.Stehman S.V. Selecting and interpreting measures of thematic classification accuracy. Remote Sensing of Environment, 1997, vol. 62, no. 1, pp. 77–89. doi: 10.1016/S0034-4257(97)00083-7

33.Vapnik V.N. The Nature of Statistical Learning Theory. 2^nd ed. NY, Springer, 2000, 314 p.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License