doi: 10.17586/2226-1494-2024-24-5-834-842


Creation and analysis of multimodal corpus for aggressive behavior recognition

M. Y. Uzdiaev, A. A. Karpov


Read the full article  ';
Article in Russian

For citation:
Uzdiaev M.Yu., Karpov A.A. Creation and analysis of multimodal corpus for aggressive behavior recognition. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2024, vol. 24, no. 5, pp. 834–842 (in Russian). doi: 10.17586/2226-1494-2024-24-5-834-842 


Abstract
The development of digital communication systems is associated with the increasing number of disruptive behavior incidents that require rapid response in order to prevent negative consequences. Due to weak formalization of human aggression, machine learning approaches are the most suitable for this area. Machine learning approaches require representative sets of relevant data for efficient aggression recognition. Datasets developing implies such problems as dataset labels relevance to the real behavior, the consistency of the situations, where behavior is manifested, and the naturalness of behavior. The purpose of this work is the development of an aggressive behavior datasets creation methodology that reflects the key aspects of aggression and provides relevant data. The work reveals the developed methodology for creation of multimodal datasets of natural aggression behavior. The analysis of human aggression subject area substantiates the key aspects of human aggression manifestations (the presence of subject and object of aggression, the destructiveness of the aggressive action), the behavior analysis units — the time intervals of audio and video with the localized informants, defines considering types of aggression (physical and verbal overt direct aggression), substantiates criteria for aggressive behavior assessment as a set of aggressive actions that define each aggression type. The methodology consists of the following stages: collecting video on the Internet, identifying time intervals where aggression is performed, localizing informants in video frames, transcribing informants’ speech, collective labeling of physical and verbal aggression actions by a group of annotators (raters), assessing the reliability of annotations agreement using Fleiss’ kappa coefficient. In order to evaluate the methodology a new audiovisual aggressive behavior in online streams corpus (AVABOS) was collected and labeled. The dataset contains audio and video segments that contains verbal and physical aggression correspondingly that manifested by Russian-speaking informants during online video streams. The results of interrater agreement reliability show substantial agreement for physical (κ = 0.74) and moderate agreement for verbal aggression (κ = 0.48) that substantiates the developed methodology. AVABOS dataset can be used in automatic aggression recognition tasks. The developed methodology can also be used for creating datasets with the other types of behavior.

Keywords: methodology for creating multimodal dataset, methodology for behavior assessment, aggressive behavior, aggression recognition, dataset creation, collective labeling, interrater reliability assessment, Fleiss’ kappa coefficient

Acknowledgements. This work was supported financially by the Russian Science Foundation (project No. 22-11-00321, https://www.rscf. ru/project/22-11-00321/).

References
  1. Lefter I., Rothkrantz L.J.M., Burghouts G.J. A comparative study on automatic audio–visual fusion for aggression detection using meta-information. Pattern Recognition Letters, 2013, vol. 34, no. 15, pp. 1953–1963. https://doi.org/10.1016/j.patrec.2013.01.002
  2. Lefter I., Burghouts G.J., Rothkrantz L.J.M. An audio-visual dataset of human–human interactions in stressful situations. Journal on Multimodal User Interfaces, 2014, vol. 8, no. 1, pp. 29–41. https://doi.org/10.1007/s12193-014-0150-7
  3. Lefter I., Jonker C.M., Tuente S.K., Veling W., Bogaerts S. NAA: A multimodal database of negative affect and aggression. Proc. of the Seventh International Conference on Affective Computing and Intelligent Interaction (ACII), 2017, pp. 21–27. https://doi.org/10.1109/ACII.2017.8273574
  4. Sernani P., Falcionelli N., Tomassini S., Contardo P., Dragoni A.F. Deep learning for automatic violence detection: Tests on the AIRTLab dataset. IEEE Access, 2021, vol. 9, pp. 160580–160595. https://doi.org/10.1109/ACCESS.2021.3131315
  5. Ciampi L., Foszner P., Messina N., Staniszewski M., Gennaro C., Falchi F., Serao G., Cogiel M., Golba D., Szczęsna A., Amato G. Bus violence: An open benchmark for video violence detection on public transport. Sensors, 2022, vol. 22, no. 21, pp. 8345. https://doi.org/10.3390/s22218345
  6. Perez M., Kot A.C., Rocha A. Detection of real-world fights in surveillance videos. Proc. of the ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 2662–2666. https://doi.org/10.1109/ICASSP.2019.8683676
  7. Cheng M., Cai K., Li M. RWF-2000: An open large scale video database for violence detection. Proc. of the 25th International Conference on Pattern Recognition (ICPR), 2021, pp. 4183–4190. https://doi.org/10.1109/ICPR48806.2021.9412502
  8. Potapova R., Komalova L.On principles of annotated databases of the semantic field “aggression”. Lecture Notes in Computer Science, 2014, vol. 8773, pp. 322–328. https://doi.org/10.1007/978-3-319-11581-8_40
  9. Apanasovich K.S., Makhnytkina O.V., Kabarov V.I., Dalevskaya O.P. RuPersonaChat: a dialog corpus for personalizing conversational agents. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2024, vol. 24, no. 2, pp. 214–221. (in Russian). https://doi.org/10.17586/2226-1494-2024-24-2-214-221
  10. Hassoun Al-Jawad M.M., Alharbi H., Almukhtar A.F., Alnawas A.A. Constructing twitter corpus of Iraqi Arabic Dialect (CIAD) for sentiment analysis. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, vol. 22, no. 2, pp. 308–316. https://doi.org/10.17586/2226-1494-2022-22-2-308-316
  11. Busso C., Bulut M., Lee C., Kazemzadeh A., Mower E., Kim S., Chang J.N., Lee S., Narayanan S.S.IEMOCAP: Interactive emotional dyadic motion capture database. Language Resources and Evaluation, 2008, vol. 42, no. 4, pp. 335–359. https://doi.org/10.1007/s10579-008-9076-6
  12. Perepelkina O., Kazimirova E., Konstantinova M. RAMAS: Russian multimodal corpus of dyadic interaction for affective computing. Lecture Notes in Computer Science, 2018, vol. 11096, pp. 501–510. https://doi.org/10.1007/978-3-319-99579-3_52
  13. Ringeval F., Sonderegger A., Sauer J., Lalanne D.Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. Proc. of the 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), 2013, pp. 1–8. https://doi.org/10.1109/FG.2013.6553805
  14. Busso C., Parthasarathy S., Burmania A., AbdelWahab M., Sadoughi N., Provost E.M.MSP-IMPROV: An acted corpus of dyadic interactions to study emotion perception. IEEE Transactions on Affective Computing, 2017, vol. 8, no. 1, pp. 67–80. https://doi.org/10.1109/TAFFC.2016.2515617
  15. EnikolopovS.N.The concept of aggression in the contemporary psychology. Prikladnajapsihologija, 2001, no. 1,pp. 60–72.(in Russian)
  16. Groth-Marnat G., WrightA.J. Handbook of Psychological Assessment. John Wiley & Sons, 2016, 824 p.
  17. Uzdiaev M., Vatamaniuk I.Investigation of manifestations of aggressive behavior by users of sociocyberphysical systems on video. Lecture Notes in Networks and Systems, 2021, vol. 231, pp. 593–604. https://doi.org/10.1007/978-3-030-90321-3_49
  18. Buss A.H. The Psychology of Aggression. John Wiley & Sons, 1961, 307 p. https://doi.org/10.1037/11160-000
  19. Radford A., Kim J.W., Xu T., Brockman G., McLeavey C., Sutskever I. Robust speech recognition via large-scale weak supervision. International conference on machine learning (PMLR), 2023, vol. 202, pp. 28492–28518.
  20. Plaquet A., Bredin H. Powerset multi-class cross entropy loss for neural speaker diarization. Proc. of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2023, pp. 3222–3226. https://doi.org/10.21437/Interspeech.2023-205
  21. Lausberg H., Sloetjes H. Coding gestural behavior with the NEUROGES-ELAN system. Behavior Research Methods, 2009, vol. 41, no. 3, pp. 841–849. https://doi.org/10.3758/BRM.41.3.841
  22. Fleiss J.L. Measuring nominal scale agreement among many raters. Psychological Bulletin, 1971, vol. 76, no. 5, pp. 378–382. https://doi.org/10.1037/h0031619
  23. Uzdiaev M.Iu., Karpov A.A. Audiovisual Aggressive Behavior in Online Streams dataset – AVABOS. Certificate of state registration of the database 2022623239, 2022. (in Russian)
  24. Landis J.R., Koch G.G. The measurement of observer agreement for categorical data. Biometrics, 1977, vol. 33, no. 1, pp. 159–174. https://doi.org/10.2307/2529310
  25. Fleiss J.L., Levin B., Paik M.C. Statistical Methods for Rates and Proportions. John Wiley & Sons, 2013, 800 p.


Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License
Copyright 2001-2024 ©
Scientific and Technical Journal
of Information Technologies, Mechanics and Optics.
All rights reserved.

Яндекс.Метрика