Method for testing NLP models with text adversarial examples

Menisov Artem B., Lomako Aleksandr G. , Sabirov Timur R.

2023 , VOLUME 23, NUMBER 5 ( september-october )

ISSN 2226-1494 (print), ISSN 2500-0373 (online)

Publications

Editor-in-Chief

Nikiforov
Vladimir O.
D.Sc., Prof.

Partners

doi: 10.17586/2226-1494-2023-23-5-946-954

Method for testing NLP models with text adversarial examples

A. B. Menisov, A. G. Lomako, T. R. Sabirov

Read the full article

Article in Russian

For citation:

Menisov A.B., Lomako A.G., Sabirov T.R. Method for testing NLP models with text adversarial examples. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2023, vol. 23, no. 5, pp. 946–954 (in Russian). doi: 10.17586/2226-1494-2023-23-5-946-954

Abstract

At present, the interpretability of Natural Language Processing (NLP) models is unsatisfactory due to the imperfection of the scientific and methodological apparatus for describing the functioning of both individual elements and models as a whole. One of the problems associated with poor interpretability is the low reliability of the functioning of neural networks that process natural language texts. Small perturbations in text data are known to affect the stability of neural networks. The paper presents a method for testing NLP models for the threat of evasion attacks. The method includes the following text adversarial examples generations: random text modification and modification generation network. Random text modification is made using homoglyphs, rearranging text, adding invisible characters and removing characters randomly. The modification generation network is based on a generative adversarial architecture of neural networks. The conducted experiments demonstrated the effectiveness of the testing method based on the network for generating text adversarial examples. The advantage of the developed method is, firstly, in the possibility of generating more natural and diverse adversarial examples, which have less restrictions, and, secondly, that multiple requests to the model under test are not required. This may be applicable in more complex test scenarios where interaction with the model is limited. The experiments showed that the developed method allowed achieving a relatively better balance of effectiveness and stealth of textual adversarial examples (e.g. GigaChat and YaGPT models tested). The results of the work showed the need to test for defects and vulnerabilities that can be exploited by attackers in order to reduce the quality of the functioning of NLP models. This indicates a lot of potential in terms of ensuring the reliability of machine learning models. A promising direction is the problem of restoring the level of security (confidentiality, availability and integrity) of NLP models.

Keywords: artificial intelligence, natural language processing, information security, adversarial attacks, security testing

Acknowledgements. The work was carried out within the framework of the grant of the President of the Russian Federation for state support of young Russian scientists — candidates of sciences MK-2485.2022.4.

References

Ilyushin E., Namiot D., Chizhov I. Attacks on machine learning systems - common problems and methods. International Journal of Open Information Technologies, 2022, vol. 10, no. 3, pp. 17–22. (in Russian)
Goodfellow I.J., Shlens J., Szegedy C. Explaining and harnessing adversarial examples. arXiv, 2014, arXiv:1412.6572. https://doi.org/10.48550/arXiv.1412.6572
Xu W., Agrawal S., Briakou E., Martindale M.J., Marine C. Understanding and detecting hallucinations in neural machine translation via model introspection. Transactions of the Association for Computational Linguistics, 2023, vol. 11, pp. 546–564. https://doi.org/10.1162/tacl_a_00563
Chang G., Gao H., Yao Z., Xiong H. TextGuise: Adaptive adversarial example attacks on text classification model. Neurocomputing, 2023, vol. 529, pp. 190–203. https://doi.org/10.1016/j.neucom.2023.01.071
Wallace E., Feng S., Kandpal N., Gardner M., Singh S. Universal adversarial triggers for attacking and analyzing NLP. Proc. of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9^th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019, pp. 2153–2162. https://doi.org/10.18653/v1/d19-1221
Alshemali B., Kalita J. Improving the reliability of deep neural networks in NLP: A review. Knowledge-Based Systems, 2020, vol. 191, pp. 105210. https://doi.org/10.1016/j.knosys.2019.105210
Chang K.W., He H., Jia R., Singh S. Robustness and adversarial examples in natural language processing. Proc. of the 2021 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts, 2021, pp. 22–26. https://doi.org/10.18653/v1/2021.emnlp-tutorials.5
Dong H., Dong J., Yuan S., Guan Z. Adversarial attack and defense on natural language processing in deep learning: a survey and perspective. Lecture Notes in Computer Science, 2023, vol. 13655, pp. 409–424. https://doi.org/10.1007/978-3-031-20096-0_31
Margarov G., Tomeyan G., Pereira M.J.V. Plagiarism detection system for Armenian language. Proc. of the 2017 Computer Science and Information Technologies (CSIT), 2017, pp. 185–189. https://doi.org/10.1109/csitechnol.2017.8312168
Lupart S., Clinchant S. A study on FGSM adversarial training for neural retrieval. Lecture Notes in Computer Science, 2023, vol. 13981, pp. 484–492. https://doi.org/10.1007/978-3-031-28238-6_39
Du P., Zheng X., Liu L., Ma H. Defending against universal attack via curvature-aware category adversarial training. Proc. of the ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 2470–2474. https://doi.org/10.1109/icassp43922.2022.9746983
Wu C., Zhang R., Guo J., De Rijke M., Fan Y., Cheng X. PRADA: Practical black-box adversarial attacks against neural ranking models. ACM Transactions on Information Systems, 2023, vol. 41, no. 4, pp. 1–27. https://doi.org/10.1145/3576923
Goldblum M., Tsipras D., Xie C., Chen X., Schwarzschild A., Song D., Madry A., Li B., Goldstein T. Dataset security for machine learning: Data poisoning, backdoor attacks, and defenses. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, vol. 45, no. 2, pp. 1563–1580. https://doi.org/10.1109/tpami.2022.3162397
Ding R., Liu H., Zhou X. IE-Net: Information-enhanced binary neural networks for accurate classification. Electronics, 2022, vol. 11, no. 6, pp. 937. https://doi.org/10.3390/electronics11060937
Huang Y., Giledereli B., Köksal A., Özgür A., Ozkirimli E. Balancing methods for multi-label text classification with long-tailed class distribution. Proc. of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021, pp. 8153–8161. https://doi.org/10.18653/v1/2021.emnlp-main.643
Zhang S., Yao H. ACE: An actor ensemble algorithm for continuous control with tree search. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, vol. 33, no. 01, pp. 5789–5796. https://doi.org/10.1609/aaai.v33i01.33015789
Antoun W., Baly F., Hajj H. AraBERT: Transformer-based model for Arabic language understanding. arXiv, 2020, arXiv:2003.00104. https://doi.org/10.48550/arXiv.2003.00104
Borges L., Martins B., Calado P. Combining similarity features and deep representation learning for stance detection in the context of checking fake news. Journal of Data and Information Quality (JDIQ), 2019, vol. 11, no. 3, pp. 1–26. https://doi.org/10.1145/3287763
Wang X., Yang Y., Deng Y., He K. Adversarial training with fast gradient projection method against synonym substitution based text attacks. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, vol. 35, no. 16, pp. 13997–14005. https://doi.org/10.1609/aaai.v35i16.17648
Yang X., Qi Y., Chen H., Liu B., Liu W. Generation-based parallel particle swarm optimization for adversarial text attacks. Information Sciences, 2023, vol. 644, pp. 119237. https://doi.org/10.1016/j.ins.2023.119237
Peng H., Wang Z., Zhao D., Wu Y., Han J., Guo S., Ji S., Zhong M. Efficient text-based evolution algorithm to hard-label adversarial attacks on text. Journal of King Saud University - Computer and Information Sciences, 2023, vol. 35, no. 5, pp. 101539. https://doi.org/10.1016/j.jksuci.2023.03.017
Hauser J., Meng Z., Pascual D., Wattenhofer R. Bert is robust! A case against word substitution-based adversarial attacks. Proc. of the ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023, pp. 1–5. https://doi.org/10.1109/icassp49357.2023.10095991

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License