AUDIO-REPLAY ATTACKS SPOOFING DETECTION FOR SPEAKER RECOGNITION SYSTEMS

Lavrentyeva  Galina M. , Sergey A. Novoselov, Kozlov Alexander V. , Kydashev Oleg Yu, Shchemelinin Vadim L. , Matveev Yuri Nikolaevich, De Marsico Maria

doi:10.17586/2226-1494-2018-18-3-428-436

2018 , VOLUME 18, NUMBER 4 ( June-July )

ISSN 2226-1494 (print), ISSN 2500-0373 (online)

Publications

Editor-in-Chief

Nikiforov
Vladimir O.
D.Sc., Prof.

Partners

doi: 10.17586/2226-1494-2018-18-3-428-436

AUDIO-REPLAY ATTACKS SPOOFING DETECTION FOR SPEAKER RECOGNITION SYSTEMS

G. M. Lavrentyeva, S. A. Novoselov, A. V. Kozlov, O. Y. Kydashev, V. L. Shchemelinin, Y. N. Matveev, M. De Marsico

Read the full article

Article in Russian

For citation: Lavrentyeva G.M., Novoselov S.A., Kozlov A.V., Kudashev O.Yu., Shchemelinin V.L., Matveev Yu.N., De Marsico M. Audio-replay attacks spoofing detection for speaker recognition systems. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2018, vol. 18, no. 3, pp. 428–436 (in Russian). doi: 10.17586/2226-1494-2018-18-3-428-436

Abstract

Subject of Research. The present work considers the problem of detecting replay attacks on voice biometric systems. Due to their simplicity, these attacks are more likely to be used by the imposters, and that is why they are of special risk. This work describes the system for detecting replay attacks that was presented on the Automatic Speaker Verification Spoofing and Countermeasures (ASVspoof) Challenge 2017 focused on this problem.Method. We study the efficiency of deep learning approach in the described task, in particular, convolutional neural networks with Max-Feature-Map activation function. Main Results. Experimental results obtained on the Challenge corpora have demonstrated high performance of such approach in contrast to current state-of-the-art baseline systems. Our primary system achieved 6.73% EER on the evaluation part of the corpora which is 72% relative improvement over the ASVspoof 2017 baseline system. Practical Relevance. The results of the work can be applied in the field of voice biometrics. The presented methods can be used in systems of automatic speaker verification and identification for detecting spoofing attacks on them.

Keywords: spoofing, replay attack detection, CNN, RNN, ASVspoof

Acknowledgements. This work was financially supported by the Ministry of Education and Science of the Russian Federation, Contract 14.578.21.0189 from 3.10.2016 (ID RFMEFI57816X0189).

References

Sebastien M., Nixon M.S., Li S.Z. Handbook of Biometric Anti-Spoofing: Trusted Biometrics under Spoofing Attacks. Springer, 2014, 281 p. doi: 10.1007/978-1-4471-6524-8
Faundez-Zanuy M., Hagmuller M., Kubin G. Speaker verification security improvement by means of speech water-marking. Speech Communication, 2006, vol. 48, no. 12, pp. 1608–1619. doi: 10.1016/j.specom.2006.06.010
Wu Z., Evans N., Kinnunen T., Yamagishi J., Alegre F., Li H. Spoofing and countermeasures for speaker verification: a survey. Speech Communication, 2005, vol. 66, pp. 130–153.doi: 10.1016/j.specom.2014.10.005
Wu Z., Kinnunen T., Evans N., Yamagishi J., Hanilci C., Sahidullah M., Sizov A. ASVspoof: the automatic speaker verification spoofing and countermeasures challenge. IEEE Journal of Selected Topics in Signal Processing, 2017, vol. 11, no. 4, pp. 588–604. doi: 10.1109/JSTSP.2017.2671435
Villalba J., Lleida E. Preventing replay attacks on speaker verification systems. Proc. IEEE Int. Carnahan Conf. on Security Technology. Barcelona, Spain, 2011, 8 p. doi: 10.1109/CCST.2011.6095943
Kinnunen T., Sahidullah M., Delgado H., Todisco M., Evans, N., Yamagishi J.,Lee K.A. The ACVspoof 2017 challenge: Assessing the limits of replay spoofing
attack detection. Proc. of Interspeech. Stockholm, Sweden, 2017, pp. 2–6. doi: 10.21437/Interspeech.2017-1111
Karpathy A., Toderici G., Shetty S., Leung T., Sukthankar R., Li F.F. Large-scale video classification with convolutional neural networks. Proc. IEEE Conf. on Computer Vision and Pattern Recognition. Columbus, USA, 2014, pp. 1725–1732. doi: 10.1109/CVPR.2014.223
Bengio Y., Courville A., Vincent P. Representation learning: a review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, vol. 35, no. 8, pp. 1798–1828. doi: 10.1109/TPAMI.2013.50
Krizhevsky A., Sutskever I., Hinton G. E. ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 2012, vol. 2, pp. 1097–1105.
Taigman Y., Yang M., Ranzato M., Wolf L. Deepface: Closing the gap to human-level performance in face verification. Proc. IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA, 2014, pp. 1701–1708. doi: 10.1109/CVPR.2014.220
Volkova S.S., Matveev Yu.N. Convolutional neural networks for face anti-spoofing. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2017, vol. 17, no. 4, pp. 702–710 (in Russian). doi: 10.17586/2226-1494-2017-17-4-702-710
Zhang C., Yu C., Hansen J.H.L. An investigation of deep-learning frameworks for speaker verification anti-spoofing. IEEE Journal of Selected Topics in Signal Processing, 2017, vol. 11, no. 4, pp. 684–694. doi: 10.1109/JSTSP.2016.2647199
Tian X., Xiao X., Siong C. E., Li H. Spoofing speech detection using temporal convolutional neural network. Proc. of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference. Jeju, South Korea, 2016. doi: 10.1109/APSIPA.2016.7820738
Lee K.A., Larcher A., Wang G. et al. The RedDots data collection for speaker recognition. Proc. of Interspeech. Dresden, Germany, 2015, pp. 2996–3000.
Todisco M., Delgado H., Evans N. A new feature for automatic speaker verification antispoofing: Constant Q cepstral coefficients. Proc. Odyssey. Bilbao, Spain, 2016. doi: 10.21437/odyssey.2016-41
Lavrentyeva G., Novoselov S., Malykh E., Kozlov A., Kudashev O., Shchemelinin V. Audio replay attack detection with deep learning frameworks. Proc. of Interspeech. Stockholm, Sweden, 2017, pp. 82–86. doi: 10.21437/Interspeech.2017-360
Wu X., He R., Sun Z., Tan T. A light CNN for deep face representation with noisy labels. arXiv: 1511.02683, 2015, 13 p.
Chung J., Gulcehre C., Cho K., Bengio Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:1412.3555, 2014.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License