A new method for countering evasion adversarial attacks on information systems based on artificial intelligence

Vorobeva Alisa A. , Matuzko Maxim A., Sivkov Dmitry I. , Safiullin Roman I., Menshchikov Alexander A.

2024 , VOLUME 24, NUMBER 2 ( march-april )

ISSN 2226-1494 (print), ISSN 2500-0373 (online)

Publications

Editor-in-Chief

Nikiforov
Vladimir O.
D.Sc., Prof.

Partners

doi: 10.17586/2226-1494-2024-24-2-256-266

A new method for countering evasion adversarial attacks on information systems based on artificial intelligence

A. A. Vorobeva, M. A. Matuzko, D. I. Sivkov, R. I. Safiullin, A. A. Menshchikov

Read the full article

Article in English

For citation:

Vorobeva A.A., Matuzko M.A., Sivkov D.I., Safiullin R.I., Menshchikov A.A. A new method for countering evasion adversarial attacks on information systems based on artificial intelligence. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2024, vol. 24, no. 2, pp. 256–266. doi: 10.17586/2226-1494-2024-24-2-256-266

Abstract

Modern artificial intelligence (AI) technologies are being used in a variety of fields, from science to everyday life. However, the widespread use of AI-based systems has highlighted a problem with their vulnerability to adversarial attacks. These attacks include methods of fooling or misleading an artificial neural network, disrupting its operations, and causing it to make incorrect predictions. This study focuses on protecting image recognition models against adversarial evasion attacks which have been recognized as the most challenging and dangerous. In these attacks, adversaries create adversarial data that contains minor perturbations compared to the original image, and then send it to a trained model in an attempt to change its response to the desired outcome. These distortions can involve adding noise or even changing a few pixels. In this paper, we consider the most relevant methods for generating adversarial data: the Fast Gradient Sign Method (FGSM), the Square Method (SQ), the predicted gradient descent method (PGD), the Basic Iterative Method (BIM), the Carlini-Wagner method (CW) and Jacobian Saliency Map Attack (JSMA). We also study modern techniques for defending against evasion attacks through model modification, such as adversarial training and pre-processing of incoming data, including spatial smoothing, feature squeezing, jpeg compression, minimizing total variance, and defensive distillation. While these methods are effective against certain types of attacks, to date, there is no single method that can be used as a universal defense. Instead, we propose a new method that combines adversarial learning with image pre-processing. We suggest that adversarial training should be performed on adversarial samples generated from common attack methods which can then be effectively defended against. The image preprocessing aims to counter attacks that were not considered during adversarial training. This allows to protect the system from new types of attacks. It is proposed to use jpeg compression and feature squeezing on the pre-processing stage. This reduces the impact of adversarial perturbations and effectively counteracts all types of considered attacks. The evaluation of image recognition model (based on convolutional neural network) performance metrics based was conducted. The experimental data included original images and adversarial images created using attack FGSM, PGD, BIM, SQ, CW, and JSMA methods. At the same time, adversarial training of the model was performed in experiments on data containing only adversarial examples for the FGSM, PGD, and BIM attack methods. Dataset used in experiments was balanced. The average accuracy of image recognition was estimated with crafted adversarial imaged datasets. It was concluded that adversarial training is effective only in countering attacks that were used during model training, while methods of pre-processing incoming data are effective only against more simple attacks. The average recognition accuracy using the developed method was 0.94, significantly higher than those considered methods for countering attacks. It has been shown that the accuracy without using any counteraction methods is approximately 0.19, while with adversarial learning it is 0.79. Spatial smoothing provides an accuracy of 0.58, and feature squeezing results in an accuracy of 0.88. Jpeg compression provides an accuracy of 0.37, total variance minimization — 0.58 and defensive distillation — 0.44. At the same time, image recognition accuracy provided by developed method for FGSM, PGD, BIM, SQ, CW, and JSMA attacks is 0.99, 0.99, 0.98, 0.98, 0.99 and 0.73, respectively. The developed method is a more universal solution for countering all types of attacks and works quite effectively against complex adversarial attacks such as CW and JSMA. The developed method makes it possible to increase accuracy of image recognition model for adversarial images. Unlike adversarial learning, it also increases recognition accuracy on adversarial data generated using attacks not used on training stage. The results are useful for researchers and practitioners in the field of machine learning.

Keywords: machine learning methods, adversarial attacks, defense mechanisms, AI-based information systems, adversarial learning

References

Szegedy C., Zaremba W., Sutskever I., Bruna J., Erhan D., Goodfellow I., Fergus R. Intriguing properties of neural networks. arXiv, 2013, arXiv:1312.6199. https://doi.org/10.48550/arXiv.1312.6199
Tabassi E., Burns K.J., Hadjimichael M., Molina-Markham A.D., Sexton J.T. A taxonomy and terminology of adversarial machine learning: NIST IR, 2019, pp. 1–29.
Goodfellow I.J., Shlens J., Szegedy C. Explaining and harnessing adversarial examples. arXiv, 2015, arXiv:1412.6572. https://doi.org/10.48550/arXiv.1412.6572
Carlini N., Mishra P., Vaidya T., Zhang Y., Sherr M., Shields C., Wagner D., Zhou W. Hidden voice commands. Proc. of the 25th USENIX Security Symposium, 2016, pp. 513–530.
Zhang G., Yan C., Ji X., Zhang T., Zhang T., Xu W. Dolphinattack: Inaudible voice commands. Proc. of the 2017 ACM SIGSAC Conference on Computer and Communications Security, 2017, pp. 103–117. https://doi.org/10.1145/3133956.3134052
Kurakin A., Goodfellow I.J., Bengio S. Adversarial machine learning at scale. International Conference on Learning Representations (ICLR), 2017.
Li X., Zhu D. Robust detection of adversarial attacks on medical images. Proc. of the 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI), 2020, pp. 1154–1158. https://doi.org/10.1109/isbi45749.2020.9098628
Imam N.H., Vassilakis V.G. A survey of attacks against twitter spam detectors in an adversarial environment. Robotics, 2019, vol. 8, no. 3, pp. 50. https://doi.org/10.3390/robotics8030050
Andriushchenko M., Croce F., Flammarion N., Hein M. Square attack: a query-efficient black-box adversarial attack via random search. Lecture Notes in Computer Science, 2020, vol. 12368, pp. 484–501. https://doi.org/10.1007/978-3-030-58592-1_29
Deng Y., Karam L.J. Universal adversarial attack via enhanced projected gradient descent. Proc. of the 2020 IEEE International Conference on Image Processing (ICIP), 2020, pp. 1241–1245. https://doi.org/10.1109/icip40778.2020.9191288
Madry A., Makelov A., Schmidt L., Tsipras D., Vladu A. Towards deep learning models resistant to adversarial attacks. International Conference on Learning Representations (ICLR), 2018.
Kurakin A., Goodfellow I.J., Bengio S. Adversarial examples in the physical world. Artificial Intelligence Safety and Security, 2018, pp. 99–112. https://doi.org/10.1201/9781351251389-8
Carlini N., Wagner D. Towards evaluating the robustness of neural networks. Proc. of the 2017 IEEE Symposium on Security and Privacy (SP), 2017, pp. 39–57. https://doi.org/10.1109/sp.2017.49
Papernot N., McDaniel P., Jha S., Fredrikson M., Celik Z.B., Swami A. The limitations of deep learning in adversarial settings. Proc. of the 2016 IEEE European Symposium on Security and Privacy (EuroS&P), 2016, pp. 372–387. https://doi.org/10.1109/eurosp.2016.36
Lowd D., Meek C. Adversarial learning. Proc. of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, 2005, pp. 641–647. https://doi.org/10.1145/1081870.1081950
Das N., Shanbhogue M., Chen S.-T., Hohman F., Chen L., Kounavis M.E., Chau D.H. Keeping the bad guys out: Protecting and vaccinating deep learning with JPEG compression. arXiv, 2017, arXiv:1705.02900. https://doi.org/10.48550/arXiv.1705.02900
Guo C., Rana M., Cisse M., van der Maaten L. Countering adversarial images using input transformations. International Conference on Learning Representations (ICLR), 2018.
Xu W., Evans D., Qi Y. Feature squeezing: detecting adversarial examples in deep neural networks. Proc. of the 2018 Network and Distributed System Security Symposium, 2018. https://doi.org/10.14722/ndss.2018.23198
Papernot N., McDaniel P., Wu X., Jha S., Swami A. Distillation as a defense to adversarial perturbations against deep neural networks. Proc. of the 2016 IEEE Symposium on Security and Privacy (SP), 2016, pp. 582–597. https://doi.org/10.1109/sp.2016.41

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License