Menu
Publications
2024
2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
Editor-in-Chief
Nikiforov
Vladimir O.
D.Sc., Prof.
Partners
doi: 10.17586/2226-1494-2024-24-5-834-842
Creation and analysis of multimodal corpus for aggressive behavior recognition
Read the full article ';
Article in Russian
For citation:
Abstract
For citation:
Uzdiaev M.Yu., Karpov A.A. Creation and analysis of multimodal corpus for aggressive behavior recognition. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2024, vol. 24, no. 5, pp. 834–842 (in Russian). doi: 10.17586/2226-1494-2024-24-5-834-842
Abstract
The development of digital communication systems is associated with the increasing number of disruptive behavior incidents that require rapid response in order to prevent negative consequences. Due to weak formalization of human aggression, machine learning approaches are the most suitable for this area. Machine learning approaches require representative sets of relevant data for efficient aggression recognition. Datasets developing implies such problems as dataset labels relevance to the real behavior, the consistency of the situations, where behavior is manifested, and the naturalness of behavior. The purpose of this work is the development of an aggressive behavior datasets creation methodology that reflects the key aspects of aggression and provides relevant data. The work reveals the developed methodology for creation of multimodal datasets of natural aggression behavior. The analysis of human aggression subject area substantiates the key aspects of human aggression manifestations (the presence of subject and object of aggression, the destructiveness of the aggressive action), the behavior analysis units — the time intervals of audio and video with the localized informants, defines considering types of aggression (physical and verbal overt direct aggression), substantiates criteria for aggressive behavior assessment as a set of aggressive actions that define each aggression type. The methodology consists of the following stages: collecting video on the Internet, identifying time intervals where aggression is performed, localizing informants in video frames, transcribing informants’ speech, collective labeling of physical and verbal aggression actions by a group of annotators (raters), assessing the reliability of annotations agreement using Fleiss’ kappa coefficient. In order to evaluate the methodology a new audiovisual aggressive behavior in online streams corpus (AVABOS) was collected and labeled. The dataset contains audio and video segments that contains verbal and physical aggression correspondingly that manifested by Russian-speaking informants during online video streams. The results of interrater agreement reliability show substantial agreement for physical (κ = 0.74) and moderate agreement for verbal aggression (κ = 0.48) that substantiates the developed methodology. AVABOS dataset can be used in automatic aggression recognition tasks. The developed methodology can also be used for creating datasets with the other types of behavior.
Keywords: methodology for creating multimodal dataset, methodology for behavior assessment, aggressive behavior, aggression
recognition, dataset creation, collective labeling, interrater reliability assessment, Fleiss’ kappa coefficient
Acknowledgements. This work was supported financially by the Russian Science Foundation (project No. 22-11-00321, https://www.rscf. ru/project/22-11-00321/).
References
Acknowledgements. This work was supported financially by the Russian Science Foundation (project No. 22-11-00321, https://www.rscf. ru/project/22-11-00321/).
References
- Lefter I., Rothkrantz L.J.M., Burghouts G.J. A comparative study on automatic audio–visual fusion for aggression detection using meta-information. Pattern Recognition Letters, 2013, vol. 34, no. 15, pp. 1953–1963. https://doi.org/10.1016/j.patrec.2013.01.002
- Lefter I., Burghouts G.J., Rothkrantz L.J.M. An audio-visual dataset of human–human interactions in stressful situations. Journal on Multimodal User Interfaces, 2014, vol. 8, no. 1, pp. 29–41. https://doi.org/10.1007/s12193-014-0150-7
- Lefter I., Jonker C.M., Tuente S.K., Veling W., Bogaerts S. NAA: A multimodal database of negative affect and aggression. Proc. of the Seventh International Conference on Affective Computing and Intelligent Interaction (ACII), 2017, pp. 21–27. https://doi.org/10.1109/ACII.2017.8273574
- Sernani P., Falcionelli N., Tomassini S., Contardo P., Dragoni A.F. Deep learning for automatic violence detection: Tests on the AIRTLab dataset. IEEE Access, 2021, vol. 9, pp. 160580–160595. https://doi.org/10.1109/ACCESS.2021.3131315
- Ciampi L., Foszner P., Messina N., Staniszewski M., Gennaro C., Falchi F., Serao G., Cogiel M., Golba D., Szczęsna A., Amato G. Bus violence: An open benchmark for video violence detection on public transport. Sensors, 2022, vol. 22, no. 21, pp. 8345. https://doi.org/10.3390/s22218345
- Perez M., Kot A.C., Rocha A. Detection of real-world fights in surveillance videos. Proc. of the ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 2662–2666. https://doi.org/10.1109/ICASSP.2019.8683676
- Cheng M., Cai K., Li M. RWF-2000: An open large scale video database for violence detection. Proc. of the 25th International Conference on Pattern Recognition (ICPR), 2021, pp. 4183–4190. https://doi.org/10.1109/ICPR48806.2021.9412502
- Potapova R., Komalova L.On principles of annotated databases of the semantic field “aggression”. Lecture Notes in Computer Science, 2014, vol. 8773, pp. 322–328. https://doi.org/10.1007/978-3-319-11581-8_40
- Apanasovich K.S., Makhnytkina O.V., Kabarov V.I., Dalevskaya O.P. RuPersonaChat: a dialog corpus for personalizing conversational agents. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2024, vol. 24, no. 2, pp. 214–221. (in Russian). https://doi.org/10.17586/2226-1494-2024-24-2-214-221
- Hassoun Al-Jawad M.M., Alharbi H., Almukhtar A.F., Alnawas A.A. Constructing twitter corpus of Iraqi Arabic Dialect (CIAD) for sentiment analysis. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, vol. 22, no. 2, pp. 308–316. https://doi.org/10.17586/2226-1494-2022-22-2-308-316
- Busso C., Bulut M., Lee C., Kazemzadeh A., Mower E., Kim S., Chang J.N., Lee S., Narayanan S.S.IEMOCAP: Interactive emotional dyadic motion capture database. Language Resources and Evaluation, 2008, vol. 42, no. 4, pp. 335–359. https://doi.org/10.1007/s10579-008-9076-6
- Perepelkina O., Kazimirova E., Konstantinova M. RAMAS: Russian multimodal corpus of dyadic interaction for affective computing. Lecture Notes in Computer Science, 2018, vol. 11096, pp. 501–510. https://doi.org/10.1007/978-3-319-99579-3_52
- Ringeval F., Sonderegger A., Sauer J., Lalanne D.Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. Proc. of the 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), 2013, pp. 1–8. https://doi.org/10.1109/FG.2013.6553805
- Busso C., Parthasarathy S., Burmania A., AbdelWahab M., Sadoughi N., Provost E.M.MSP-IMPROV: An acted corpus of dyadic interactions to study emotion perception. IEEE Transactions on Affective Computing, 2017, vol. 8, no. 1, pp. 67–80. https://doi.org/10.1109/TAFFC.2016.2515617
- EnikolopovS.N.The concept of aggression in the contemporary psychology. Prikladnajapsihologija, 2001, no. 1,pp. 60–72.(in Russian)
- Groth-Marnat G., WrightA.J. Handbook of Psychological Assessment. John Wiley & Sons, 2016, 824 p.
- Uzdiaev M., Vatamaniuk I.Investigation of manifestations of aggressive behavior by users of sociocyberphysical systems on video. Lecture Notes in Networks and Systems, 2021, vol. 231, pp. 593–604. https://doi.org/10.1007/978-3-030-90321-3_49
- Buss A.H. The Psychology of Aggression. John Wiley & Sons, 1961, 307 p. https://doi.org/10.1037/11160-000
- Radford A., Kim J.W., Xu T., Brockman G., McLeavey C., Sutskever I. Robust speech recognition via large-scale weak supervision. International conference on machine learning (PMLR), 2023, vol. 202, pp. 28492–28518.
- Plaquet A., Bredin H. Powerset multi-class cross entropy loss for neural speaker diarization. Proc. of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2023, pp. 3222–3226. https://doi.org/10.21437/Interspeech.2023-205
- Lausberg H., Sloetjes H. Coding gestural behavior with the NEUROGES-ELAN system. Behavior Research Methods, 2009, vol. 41, no. 3, pp. 841–849. https://doi.org/10.3758/BRM.41.3.841
- Fleiss J.L. Measuring nominal scale agreement among many raters. Psychological Bulletin, 1971, vol. 76, no. 5, pp. 378–382. https://doi.org/10.1037/h0031619
- Uzdiaev M.Iu., Karpov A.A. Audiovisual Aggressive Behavior in Online Streams dataset – AVABOS. Certificate of state registration of the database 2022623239, 2022. (in Russian)
- Landis J.R., Koch G.G. The measurement of observer agreement for categorical data. Biometrics, 1977, vol. 33, no. 1, pp. 159–174. https://doi.org/10.2307/2529310
- Fleiss J.L., Levin B., Paik M.C. Statistical Methods for Rates and Proportions. John Wiley & Sons, 2013, 800 p.