<div>
	Creation and analysis of multimodal corpus for aggressive behavior recognition</div>

Uzdiaev Mikhail Yu. , Karpov Alexey A

2024 , VOLUME 24, NUMBER 5 ( september-october )

ISSN 2226-1494 (print), ISSN 2500-0373 (online)

Publications

Editor-in-Chief

Nikiforov
Vladimir O.
D.Sc., Prof.

Partners

doi: 10.17586/2226-1494-2024-24-5-834-842

Creation and analysis of multimodal corpus for aggressive behavior recognition

M. Y. Uzdiaev, A. A. Karpov

Read the full article

Article in Russian

For citation:

Uzdiaev M.Yu., Karpov A.A. Creation and analysis of multimodal corpus for aggressive behavior recognition. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2024, vol. 24, no. 5, pp. 834–842 (in Russian). doi: 10.17586/2226-1494-2024-24-5-834-842

Abstract

The development of digital communication systems is associated with the increasing number of disruptive behavior incidents that require rapid response in order to prevent negative consequences. Due to weak formalization of human aggression, machine learning approaches are the most suitable for this area. Machine learning approaches require representative sets of relevant data for efficient aggression recognition. Datasets developing implies such problems as dataset labels relevance to the real behavior, the consistency of the situations, where behavior is manifested, and the naturalness of behavior. The purpose of this work is the development of an aggressive behavior datasets creation methodology that reflects the key aspects of aggression and provides relevant data. The work reveals the developed methodology for creation of multimodal datasets of natural aggression behavior. The analysis of human aggression subject area substantiates the key aspects of human aggression manifestations (the presence of subject and object of aggression, the destructiveness of the aggressive action), the behavior analysis units — the time intervals of audio and video with the localized informants, defines considering types of aggression (physical and verbal overt direct aggression), substantiates criteria for aggressive behavior assessment as a set of aggressive actions that define each aggression type. The methodology consists of the following stages: collecting video on the Internet, identifying time intervals where aggression is performed, localizing informants in video frames, transcribing informants’ speech, collective labeling of physical and verbal aggression actions by a group of annotators (raters), assessing the reliability of annotations agreement using Fleiss’ kappa coefficient. In order to evaluate the methodology a new audiovisual aggressive behavior in online streams corpus (AVABOS) was collected and labeled. The dataset contains audio and video segments that contains verbal and physical aggression correspondingly that manifested by Russian-speaking informants during online video streams. The results of interrater agreement reliability show substantial agreement for physical (κ = 0.74) and moderate agreement for verbal aggression (κ = 0.48) that substantiates the developed methodology. AVABOS dataset can be used in automatic aggression recognition tasks. The developed methodology can also be used for creating datasets with the other types of behavior.

Keywords: methodology for creating multimodal dataset, methodology for behavior assessment, aggressive behavior, aggression recognition, dataset creation, collective labeling, interrater reliability assessment, Fleiss’ kappa coefficient

Acknowledgements. This work was supported financially by the Russian Science Foundation (project No. 22-11-00321, https://www.rscf. ru/project/22-11-00321/).

References

Lefter I., Rothkrantz L.J.M., Burghouts G.J. A comparative study on automatic audio–visual fusion for aggression detection using meta-information. Pattern Recognition Letters, 2013, vol. 34, no. 15, pp. 1953–1963. https://doi.org/10.1016/j.patrec.2013.01.002
Lefter I., Burghouts G.J., Rothkrantz L.J.M. An audio-visual dataset of human–human interactions in stressful situations. Journal on Multimodal User Interfaces, 2014, vol. 8, no. 1, pp. 29–41. https://doi.org/10.1007/s12193-014-0150-7
Lefter I., Jonker C.M., Tuente S.K., Veling W., Bogaerts S. NAA: A multimodal database of negative affect and aggression. Proc. of the Seventh International Conference on Affective Computing and Intelligent Interaction (ACII), 2017, pp. 21–27. https://doi.org/10.1109/ACII.2017.8273574
Sernani P., Falcionelli N., Tomassini S., Contardo P., Dragoni A.F. Deep learning for automatic violence detection: Tests on the AIRTLab dataset. IEEE Access, 2021, vol. 9, pp. 160580–160595. https://doi.org/10.1109/ACCESS.2021.3131315
Ciampi L., Foszner P., Messina N., Staniszewski M., Gennaro C., Falchi F., Serao G., Cogiel M., Golba D., Szczęsna A., Amato G. Bus violence: An open benchmark for video violence detection on public transport. Sensors, 2022, vol. 22, no. 21, pp. 8345. https://doi.org/10.3390/s22218345
Perez M., Kot A.C., Rocha A. Detection of real-world fights in surveillance videos. Proc. of the ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 2662–2666. https://doi.org/10.1109/ICASSP.2019.8683676
Cheng M., Cai K., Li M. RWF-2000: An open large scale video database for violence detection. Proc. of the 25^th International Conference on Pattern Recognition (ICPR), 2021, pp. 4183–4190. https://doi.org/10.1109/ICPR48806.2021.9412502
Potapova R., Komalova L.On principles of annotated databases of the semantic field “aggression”. Lecture Notes in Computer Science, 2014, vol. 8773, pp. 322–328. https://doi.org/10.1007/978-3-319-11581-8_40
Apanasovich K.S., Makhnytkina O.V., Kabarov V.I., Dalevskaya O.P. RuPersonaChat: a dialog corpus for personalizing conversational agents. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2024, vol. 24, no. 2, pp. 214–221. (in Russian). https://doi.org/10.17586/2226-1494-2024-24-2-214-221
Hassoun Al-Jawad M.M., Alharbi H., Almukhtar A.F., Alnawas A.A. Constructing twitter corpus of Iraqi Arabic Dialect (CIAD) for sentiment analysis. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, vol. 22, no. 2, pp. 308–316. https://doi.org/10.17586/2226-1494-2022-22-2-308-316
Busso C., Bulut M., Lee C., Kazemzadeh A., Mower E., Kim S., Chang J.N., Lee S., Narayanan S.S.IEMOCAP: Interactive emotional dyadic motion capture database. Language Resources and Evaluation, 2008, vol. 42, no. 4, pp. 335–359. https://doi.org/10.1007/s10579-008-9076-6
Perepelkina O., Kazimirova E., Konstantinova M. RAMAS: Russian multimodal corpus of dyadic interaction for affective computing. Lecture Notes in Computer Science, 2018, vol. 11096, pp. 501–510. https://doi.org/10.1007/978-3-319-99579-3_52
Ringeval F., Sonderegger A., Sauer J., Lalanne D.Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. Proc. of the 10^th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), 2013, pp. 1–8. https://doi.org/10.1109/FG.2013.6553805
Busso C., Parthasarathy S., Burmania A., AbdelWahab M., Sadoughi N., Provost E.M.MSP-IMPROV: An acted corpus of dyadic interactions to study emotion perception. IEEE Transactions on Affective Computing, 2017, vol. 8, no. 1, pp. 67–80. https://doi.org/10.1109/TAFFC.2016.2515617
EnikolopovS.N.The concept of aggression in the contemporary psychology. Prikladnajapsihologija, 2001, no. 1,pp. 60–72.(in Russian)
Groth-Marnat G., WrightA.J. Handbook of Psychological Assessment. John Wiley & Sons, 2016, 824 p.
Uzdiaev M., Vatamaniuk I.Investigation of manifestations of aggressive behavior by users of sociocyberphysical systems on video. Lecture Notes in Networks and Systems, 2021, vol. 231, pp. 593–604. https://doi.org/10.1007/978-3-030-90321-3_49
Buss A.H. The Psychology of Aggression. John Wiley & Sons, 1961, 307 p. https://doi.org/10.1037/11160-000
Radford A., Kim J.W., Xu T., Brockman G., McLeavey C., Sutskever I. Robust speech recognition via large-scale weak supervision. International conference on machine learning (PMLR), 2023, vol. 202, pp. 28492–28518.
Plaquet A., Bredin H. Powerset multi-class cross entropy loss for neural speaker diarization. Proc. of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2023, pp. 3222–3226. https://doi.org/10.21437/Interspeech.2023-205
Lausberg H., Sloetjes H. Coding gestural behavior with the NEUROGES-ELAN system. Behavior Research Methods, 2009, vol. 41, no. 3, pp. 841–849. https://doi.org/10.3758/BRM.41.3.841
Fleiss J.L. Measuring nominal scale agreement among many raters. Psychological Bulletin, 1971, vol. 76, no. 5, pp. 378–382. https://doi.org/10.1037/h0031619
Uzdiaev M.Iu., Karpov A.A. Audiovisual Aggressive Behavior in Online Streams dataset – AVABOS. Certificate of state registration of the database 2022623239, 2022. (in Russian)
Landis J.R., Koch G.G. The measurement of observer agreement for categorical data. Biometrics, 1977, vol. 33, no. 1, pp. 159–174. https://doi.org/10.2307/2529310
Fleiss J.L., Levin B., Paik M.C. Statistical Methods for Rates and Proportions. John Wiley & Sons, 2013, 800 p.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License