Methods for audiovisual recognition of people in masks

Kirill E. Kosulin, Karpov Alexey A

2022 , VOLUME 22, NUMBER 3 ( March-April )

ISSN 2226-1494 (print), ISSN 2500-0373 (online)

Publications

Editor-in-Chief

Nikiforov
Vladimir O.
D.Sc., Prof.

Partners

doi: 10.17586/2226-1494-2022-22-3-415-432

Methods for audiovisual recognition of people in masks

K. E. Kosulin, A. A. Karpov

Read the full article

Article in Russian

For citation:

Kosulin K.E., Karpov A.A. Methods for audiovisual recognition of people in masks. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2022, vol. 22, no. 3, pp. 415–432 (in Russian). doi: 10.17586/2226-1494-2022-22-3-415-432

Abstract

In the modern world, wearing masks, respirators and facial clothes is very popular. The novel coronavirus pandemic that began in 2019 has also significantly increased the applicability of masks in public places. The most affective person recognition methods are identification by face image and voice recording. However, person recognition systems are facing new challenges due to masks covering most of the subject’s face. Existence of new problems for intelligent systems determines the relevance of masked person recognition systems research, therefore the subject of the study is the systems and datasets for masked people recognition. The article discusses analysis of the main approaches to masked people identity recognition: masked face recognition, masked voice recognition and audiovisual methods. In addition, this article includes comparative analysis of images and recordings datasets required for person recognition systems. The results of the study showed that among the methods that use face images the most effective are methods based on convolutional neural networks and the mask area feature extraction. The methods of x-vector analysis showed a slight drop in efficiency which allows us to conclude that they are applicable in the tasks of recognizing the identity of a speaker in a mask. Results of this study help with formulation of requirements for perspective masked person recognition systems and determining directions for further research.

Keywords: person recognition, facial biometrics, voice biometrics, medical masks, personal protective equipment, audiovisual features, information fusion

Acknowledgements. This work was partially supported by the RFBR (project No. 20-04-60529), by the Council for Grants of the President of Russia (grant No. NSH-17.2022.1.6), as well as by the Russian state research (No. 0073-2019-0005).

References

Chernenkova A. Facial recognition technology in Russia: do the citizens of Russia accept it? University of Twente.BMS Faculty Department of Communication Science UniversityofTwente, 2021,78 p.
Kukharev G.A., Matveev Yu.N., Forczmański P. People retrieval by means of composite pictures - methods, systems and practical decisions. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2015, vol. 15, no. 4, pp. 640–653. (in Russian). https://doi.org/10.17586/2226-1494-2015-15-4-640-653
GrotherP., Ngan M. Face recognition vendor test (FRVT). NISTInteragencyReport8009. US Department of Commerce, National Institute of StandardsandTechnology, 2014. https://doi.org/10.6028/NIST.IR.8009
Dvoynikova A., Markitantov M., Ryumina E., Ryumin D., Karpov A. Analytical review of audiovisual systems for determining personal protective equipment on a person's face. Informatics and Automation, 2021, vol. 20, no. 5, pp. 1116–1152. (in Russian). https://doi.org/10.15622/20.5.5
Wang Z., Wang G., Huang B., Xiong Z., Hong Q., Wu H., Yi P., Jiang K., Wang N., Pei Y., Chen H., Miao Y., Huang Z., Liang J. Masked face recognition dataset and application.arXiv, 2020, arXiv:2003.09093.
Geng M., Peng P., Huang Y., Tian Y. Masked face recognition with generative data augmentation and domain constrained ranking.Proc. of the 28^th ACM International Conference on Multimedia, 2020, pp. 2246–2254. https://doi.org/10.1145/3394171.3413723
Din N.U., Javed K., Bae S., Yi J. A novel GAN-based network for unmasking of masked face.IEEE Access, 2020, vol. 8, pp. 44276–44287. https://doi.org/10.1109/ACCESS.2020.2977386
Din N.U., Javed K., Bae S., Yi J. Effective removal of user-selected foreground object from facial images using a novel GAN-based network.IEEE Access, 2020, vol. 8, pp. 109648–109661. https://doi.org/10.1109/ACCESS.2020.3001649
Deng J., Guo J., Xue N., Zafeiriou S. ArcFace: Additive angular margin loss for deep face recognition. Proc. of the 32^nd IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 4685–4694. https://doi.org/10.1109/CVPR.2019.00482
Boutros F., Damer N., Kirchbuchner F., Kuijper A. Self-restrained triplet loss for accurate masked face recognition. Pattern Recognition, 2022, vol. 124, pp. 108473. https://doi.org/10.1016/j.patcog.2021.108473
Anwar A., Raychowdhury A. Masked face recognition for secure authentication. arXiv, 2020, arXiv:2008.11104. https://doi.org/10.48550/arXiv.2008.11104
Damer N., Grebe J.H., Chen C., Boutros F., Kirchbuchner F., Kuijper A. The effect of wearing a mask on face recognition performance: an exploratory study. Proc. of the 19^th International Conference of the Biometrics Special Interest Group (BIOSIG), 2020, pp. 9210999.
Huang G.B., Ramesh M., Berg T., Learned-Miller E. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Workshop on Faces in'RealLife'Images: Detection, Alignment, and Recognition, 2008.
Ding F., Peng P., Huang Y., Geng M., Tian Y. Masked face recognition with latent part detection. Proc. of the 28^th ACM international Conference on Multimedia, 2020, pp. 2281–2289. https://doi.org/10.1145/3394171.3413731
Li S., Yi D., Lei Z., Liao S. The CASIA NIR-VIS 2.0 face database. Proc. of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2013, pp. 348–353. https://doi.org/10.1109/CVPRW.2013.59
Deng H., Feng Z., Qian G., Lv X., Li H., Li G. MFCosface: a masked-face recognition algorithm based on large margin cosine loss. Applied Sciences, 2021, vol. 11, no. 16, pp. 7310. https://doi.org/10.3390/app11167310
Yi D., Lei Z., Liao S., Li S.Z. Learning face representation from scratch. arXiv, 2014, arXiv:1411.7923. https://doi.org/10.48550/arXiv.1411.7923
Hong J.H., Kim H., Kim M., Nam G.P., Cho J., Ko H.-S., Kim I.-J. A 3D model-based approach for fitting masks to faces in the wild. Proc. of the IEEE International Conference on Image Processing (ICIP), 2021, pp. 235–239. https://doi.org/10.1109/ICIP42928.2021.9506069
Hariri W. Efficient masked face recognition method during the covid-19 pandemic. Signal, Image and Video Processing, 2022, vol. 16, no. 3, pp. 605–612. https://doi.org/10.1007/s11760-021-02050-w
Maharani D.A., MacHbub C., Rusmin P.H., Yulianti L. Improving the capability of real-time face masked recognition using cosine distance. Proc. of the 6^th International Conference on Interactive Digital Media (ICIDM), 2020, pp. 9339677. https://doi.org/10.1109/ICIDM51048.2020.9339677
Montero D., Nieto M., Leskovsky P., Aginako N. Boosting masked face recognition with multi-task arcface. arXiv, 2021, arXiv:2104.09874. https://doi.org/10.48550/arXiv.2104.09874
Golwalkar R., Mehendale N. Masked Face Recognition Using Deep Metric Learning and FaceMaskNet21. SSRN Electronic Journal, 2020, pp. 3731223. http://dx.doi.org/10.2139/ssrn.3731223
Li C., Ge S., Zhang D., Li J. Look through masks: Towards masked face recognition with de-occlusion distillation. Proc. of the 28^th ACM International Conference on Multimedia, 2020, pp. 3016–3024. https://doi.org/10.1145/3394171.3413960
Hong Q., Wang Z., He Z., Wang N., Tian X., Lu T. Masked face recognition with identification association. Proc. of the IEEE 32^nd International Conference on Tools with Artificial Intelligence (ICTAI), 2020, pp. 731–735. https://doi.org/10.1109/ICTAI50040.2020.00116
Mandal B., Okeukwu A., Theis Y. Masked face recognition using ResNet-50. arXiv, 2021, arXiv:2104.08997. https://doi.org/10.48550/arXiv.2104.08997
Du H., Shi H., Liu Y., Zeng D., Mei T. Towards NIR-VIS masked face recognition. IEEE Signal Processing Letters, 2021, vol. 28, pp. 768–772. https://doi.org/10.1109/LSP.2021.3071663
Wu G.L. Masked face recognition algorithm for a contactless distribution cabinet. Mathematical Problem in Engineering, 2021, vol. 2021, pp. 5591020. https://doi.org/10.1155/2021/5591020
Li Y., Guo K., Lu Y., Liu L. Cropping and attention based approach for masked face recognition. Applied Intelligence, 2021, vol. 51, no. 5, pp. 3012–3025. https://doi.org/10.1007/s10489-020-02100-9
Gover A.R., Harper S.B., Langton L. Anti-Asian hate crime during the COVID-19 pandemic: Exploring the reproduction of inequality. American Journal of Criminal Justice, 2020, vol. 45, no. 4, pp. 647–667. https://doi.org/10.1007/s12103-020-09545-1
Saeidi R., Niemi T., Karppelin H., Pohjalainen J., Kinnunen T., Alku P. Speaker recognition for speech under face cover. Proc. of the Interspeech, 2015, pp. 1012–1016. https://doi.org/10.21437/Interspeech.2015-275
Zhang C., Tan T. Voice disguise and automatic speaker recognition. Forensic Science International, 2008, vol. 175, no. 2-3, pp. 118–122. https://doi.org/10.1016/j.forsciint.2007.05.019
Fecher N. The "audio-visual face cover corpus": investigations into audio-visual speech and speaker recognition when the speaker's face is occluded by facewear. Proc. of the 13th Annual Conference of the International Speech Communication Association (INTERSPEECH), 2012, pp. 2250–2253. https://doi.org/10.21437/Interspeech.2012-133
Iszatt T., Malkoc E., Kelly F., Alexander A. Exploring the impact of face coverings on x-vector speaker recognition using VOCALISE. Poc. of the Conference: International Association of Forensic Phonetics and Acoustics, 2021.
Schuller B.W., Batliner A., Bergler C., Messner E.-M., Hamilton A., Amiriparian S., Baird A., Rizos G., Schmitt M., Stappen L., Baumeister H., MacIntyre D.A., Hantke S. The interspeech 2020 computational paralinguistics challenge: Elderly emotion, breathing & masks. Proc. of the Interspeech, 2020, pp. 2042–2046. https://doi.org/10.21437/Interspeech.2020-32
Montacié C., Caraty M. J. Phonetic, frame clustering and intelligibility analyses for the INTERSPEECH 2020 ComParE Challenge. Proc. of the 21^st Annual Conference of the International Speech Communication Association (INTERSPEECH), 2020, pp. 2062–2066. https://doi.org/10.21437/Interspeech.2020-2243
Riumina E.V., Riumin D.A., Markitantov M.V., Karpov A.A. Training data generation method for a computer system for detecting safety masks on people's faces. Computer Optics, 2022, vol. 46. in press (in Russian).
Przybocki M.A., Martin A.F., Le A.N. NIST speaker recognition evaluations utilizing the Mixer corpora–2004, 2005, 2006.IEEE Transactions on Audio, Speech, and Language Processing, 2007,vol. 15,no. 7, pp. 1951–1959. https://doi.org/10.1109/TASL.2007.902489
Alam M.R., Bennamoun M., Togneri R., Sohel F. An efficient reliability estimation technique for audio-visual person identification.Proc. of the IEEE 8^th Conference on Industrial Electronics and Applications (ICIEA), 2013, pp. 1631–1635.
Zhao X., Evans N., Dugelay J.-L. Multi-view semi-supervised discriminant analysis: A new approach to audio-visual person recognition. Proc. of the 20^th European Signal Processing Conference (EUSIPCO), 2012, pp. 31–35.
Nishino T., Kajikawa Y., Muneyasu M. Multimodal person authentication system using features of utterance.Proc. of the 20^th International Symposium on Intelligent Signal Processing and Communications Systems (ISPACS), 2012, pp. 43–47. https://doi.org/10.1109/ISPACS.2012.6473450
Alam M.R., Bennamoun M., Togneri R., Sohel F. A joint deep Boltzmann machine (jDBM) model for person identification using mobile phone data.IEEE Transactions on Multimedia, 2017, vol. 19,no. 2, pp. 317–326. https://doi.org/10.1109/TMM.2016.2615524
Gofman M., Sandico N., Mitra S., Suo E., Muhi S., Vu T. Multimodal biometrics via discriminant correlation analysis on mobile devices. Proc. of the International Conference on Security and Management (SAM). The Steering Committee of The World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp), 2018,P. 174–181.
Shenoy R.V. Hidden Markov Models for Analysis of Multimodal Biomedical Images: A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Electrical and Computer Engineering. University of California, Santa Barbara, 2016, 99 p.
Chen Y., Yang J., Wang C., Liu N. Multimodal biometrics recognition based on local fusion visual features and variational Bayesian extreme learning machine.Expert Systems with Applications, 2016, vol. 64,pp. 93–103. https://doi.org/10.1016/j.eswa.2016.07.009
Garau M., Fraschini M., Didaci L., Marcialis G.L. Experimental results on multi-modal fusion of EEG-based personal verification algorithms.Proc. of the 9^th International Conference on Biometrics (ICB), 2016, pp. 7550080. https://doi.org/10.1109/ICB.2016.7550080
Shen L., Zheng N., Zheng S., Li W. Secure mobile services by face and speech based personal authentication.Proc. of the IEEE International Conference on Intelligent Computing and Intelligent Systems (ICIS). Vol. 3, 2010, pp. 97–100. https://doi.org/10.1109/ICICISYS.2010.5658534
Irfan B., Ortiz M.-G., Lyubova N., Belpaeme T. Multi-modal open-set person identification in HRI. Proc. of the 2018 ACM/IEEE International Conference on Human-Robot Interaction Social Robots in the Wild workshop, 2018.
Antipov G., Gengembre N., Le Blouch O., Le Lan G. Automatic quality assessment for audio-visual verification systems. The LOVe submission to NIST SRE challenge 2019. Proc. of the 21^st Annual Conference of the International Speech Communication Association (INTERSPEECH), 2020, pp. 2237–2241. https://doi.org/10.21437/Interspeech.2020-1434
Estival D., Cassidy S., Cox F., Burnham D. AusTalk: an audio-visual corpus of Australian English. Proc. of the Ninth International Conference on Language Resources and Evaluation (LREC'14), 2014.
Khoury E., El Shafey L., McCool C., Günther M., Marcel S. Bi-modal biometric authentication on mobile phones in challenging conditions. Image and Vision Computing, 2014, vol. 32, no. 12, pp. 1147–1160. https://doi.org/10.1016/j.imavis.2013.10.001
McCool C., Marcel S. MOBIO database for the ICPR 2010 face and speech competition. Idiap Communication Report. Idiap Research Institute, 2009.
Alam M.R., Togneri R., Sohel F., Bennamoun M., Naseem I. Linear regression-based classifier for audio visual person identification. Proc. of the 1^st International Conference on Communications, Signal Processing, and their Applications (ICCSPA), 2013, pp. 6487281. https://doi.org/10.1109/ICCSPA.2013.6487281
Tresadern P., Cootes T.F., Poh N., Matejka P., Hadid A., Lévy C., McCool C., Marcel S. Mobile biometrics: Combined face and voice verification for a mobile platform. IEEE Pervasive Computing, 2013, vol. 12, no. 1, pp. 79–87. https://doi.org/10.1109/MPRV.2012.54
Islam R., Sobhan A. BPN based likelihood ratio score fusion for audio-visual speaker identification in response to noise. International Scholarly Research Notices, 2014, vol. 2014, pp. 737814. https://doi.org/10.1155/2014/737814
Primorac R., Togneri R., Bennamoun M., Sohel F. Audio-visual biometric recognition via joint sparse representations. Proc. of the 23^rd International Conference on Pattern Recognition (ICPR), 2016, pp. 3031–3035. https://doi.org/10.1109/ICPR.2016.7900099
Memon Q., AlKassim Z., AlHassan E., Omer M., Alsiddig M. Audio-visual biometric authentication for secured access into personal devices. Proc. of the 6th International Conference on Bioinformatics and Biomedical Science (ICBBS), 2017, pp. 85–89. https://doi.org/10.1145/3121138.3121165
Gofman M.I., Mitra S., Cheng T.-H.K., Smith N.T. Multimodal biometrics for enhanced mobile device security. Communications of the ACM, 2016, vol. 59, no. 4, pp. 58–65. https://doi.org/10.1145/2818990
Sadjadi S.O., Greenberg C., Singer E., Reynolds D., Mason L., Hernandez-Cordero J. The 2019 NIST speaker recognition evaluation CTS challenge. Proc. of the Speaker and Language Recognition Workshop (Odyssey 2020), 2020, pp. 266–272. https://doi.org/10.21437/Odyssey.2020-38
Yu C., Huang L. Biometric recognition by using audio and visual feature fusion. Proc. of the 2012 International Conference on System Science and Engineering (ICSSE), 2012, pp. 173178. https://doi.org/10.1109/ICSSE.2012.6257171

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License