doi: 10.17586/2226-1494-2023-23-5-946-954

Method for testing NLP models with text adversarial examples

A. B. Menisov, A. G. Lomako, T. R. Sabirov

Read the full article  ';
Article in Russian

For citation:
Menisov A.B., Lomako A.G., Sabirov T.R. Method for testing NLP models with text adversarial examples. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2023, vol. 23, no. 5, pp. 946–954 (in Russian). doi: 10.17586/2226-1494-2023-23-5-946-954

At present, the interpretability of Natural Language Processing (NLP) models is unsatisfactory due to the imperfection of the scientific and methodological apparatus for describing the functioning of both individual elements and models as a whole. One of the problems associated with poor interpretability is the low reliability of the functioning of neural networks that process natural language texts. Small perturbations in text data are known to affect the stability of neural networks. The paper presents a method for testing NLP models for the threat of evasion attacks. The method includes the following text adversarial examples generations: random text modification and modification generation network. Random text modification is made using homoglyphs, rearranging text, adding invisible characters and removing characters randomly. The modification generation network is based on a generative adversarial architecture of neural networks. The conducted experiments demonstrated the effectiveness of the testing method based on the network for generating text adversarial examples. The advantage of the developed method is, firstly, in the possibility of generating more natural and diverse adversarial examples, which have less restrictions, and, secondly, that multiple requests to the model under test are not required. This may be applicable in more complex test scenarios where interaction with the model is limited. The experiments showed that the developed method allowed achieving a relatively better balance of effectiveness and stealth of textual adversarial examples (e.g. GigaChat and YaGPT models tested). The results of the work showed the need to test for defects and vulnerabilities that can be exploited by attackers in order to reduce the quality of the functioning of NLP models. This indicates a lot of potential in terms of ensuring the reliability of machine learning models. A promising direction is the problem of restoring the level of security (confidentiality, availability and integrity) of NLP models.

Keywords: artificial intelligence, natural language processing, information security, adversarial attacks, security testing

Acknowledgements. The work was carried out within the framework of the grant of the President of the Russian Federation for state support of young Russian scientists — candidates of sciences MK-2485.2022.4.

  1. Ilyushin E., Namiot D., Chizhov I. Attacks on machine learning systems - common problems and methods. International Journal of Open Information Technologies, 2022, vol. 10, no. 3, pp. 17–22. (in Russian)
  2. Goodfellow I.J., Shlens J., Szegedy C. Explaining and harnessing adversarial examples. arXiv, 2014, arXiv:1412.6572.
  3. Xu W., Agrawal S., Briakou E., Martindale M.J., Marine C. Understanding and detecting hallucinations in neural machine translation via model introspection. Transactions of the Association for Computational Linguistics, 2023, vol. 11, pp. 546–564.
  4. Chang G., Gao H., Yao Z., Xiong H. TextGuise: Adaptive adversarial example attacks on text classification model. Neurocomputing, 2023, vol. 529, pp. 190–203.
  5. Wallace E., Feng S., Kandpal N., Gardner M., Singh S. Universal adversarial triggers for attacking and analyzing NLP. Proc. of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019, pp. 2153–2162.
  6. Alshemali B., Kalita J. Improving the reliability of deep neural networks in NLP: A review. Knowledge-Based Systems, 2020, vol. 191, pp. 105210.
  7. Chang K.W., He H., Jia R., Singh S. Robustness and adversarial examples in natural language processing. Proc. of the 2021 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts, 2021, pp. 22–26.
  8. Dong H., Dong J., Yuan S., Guan Z. Adversarial attack and defense on natural language processing in deep learning: a survey and perspective. Lecture Notes in Computer Science, 2023, vol. 13655, pp. 409–424.
  9. Margarov G., Tomeyan G., Pereira M.J.V. Plagiarism detection system for Armenian language. Proc. of the 2017 Computer Science and Information Technologies (CSIT), 2017, pp. 185–189.
  10. Lupart S., Clinchant S. A study on FGSM adversarial training for neural retrieval. Lecture Notes in Computer Science, 2023, vol. 13981, pp. 484–492.
  11. Du P., Zheng X., Liu L., Ma H. Defending against universal attack via curvature-aware category adversarial training. Proc. of the ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 2470–2474.
  12. Wu C., Zhang R., Guo J., De Rijke M., Fan Y., Cheng X. PRADA: Practical black-box adversarial attacks against neural ranking models. ACM Transactions on Information Systems, 2023, vol. 41, no. 4, pp. 1–27.
  13. Goldblum M., Tsipras D., Xie C., Chen X., Schwarzschild A., Song D., Madry A., Li B., Goldstein T. Dataset security for machine learning: Data poisoning, backdoor attacks, and defenses. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, vol. 45, no. 2, pp. 1563–1580.
  14. Ding R., Liu H., Zhou X. IE-Net: Information-enhanced binary neural networks for accurate classification. Electronics, 2022, vol. 11, no. 6, pp. 937.
  15. Huang Y., Giledereli B., Köksal A., Özgür A., Ozkirimli E. Balancing methods for multi-label text classification with long-tailed class distribution. Proc. of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021, pp. 8153–8161.
  16. Zhang S., Yao H. ACE: An actor ensemble algorithm for continuous control with tree search. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, vol. 33, no. 01, pp. 5789–5796.
  17. Antoun W., Baly F., Hajj H. AraBERT: Transformer-based model for Arabic language understanding. arXiv, 2020, arXiv:2003.00104.
  18. Borges L., Martins B., Calado P. Combining similarity features and deep representation learning for stance detection in the context of checking fake news. Journal of Data and Information Quality (JDIQ), 2019, vol. 11, no. 3, pp. 1–26.
  19. Wang X., Yang Y., Deng Y., He K. Adversarial training with fast gradient projection method against synonym substitution based text attacks. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, vol. 35, no. 16, pp. 13997–14005.
  20. Yang X., Qi Y., Chen H., Liu B., Liu W. Generation-based parallel particle swarm optimization for adversarial text attacks. Information Sciences, 2023, vol. 644, pp. 119237.
  21. Peng H., Wang Z., Zhao D., Wu Y., Han J., Guo S., Ji S., Zhong M. Efficient text-based evolution algorithm to hard-label adversarial attacks on text. Journal of King Saud University - Computer and Information Sciences, 2023, vol. 35, no. 5, pp. 101539.
  22. Hauser J., Meng Z., Pascual D., Wattenhofer R. Bert is robust! A case against word substitution-based adversarial attacks. Proc. of the ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023, pp. 1–5.

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License
Copyright 2001-2024 ©
Scientific and Technical Journal
of Information Technologies, Mechanics and Optics.
All rights reserved.