doi: 10.17586/2226-1494-2018-18-6-1016-1022


GRADIENT BOOSTING TREES METHOD IN THE TASK OF SOFTWARE IDENTIFICATION

K. I. Salakhutdinova, I. S. Lebedev, I. E. Krivtsova


Read the full article  ';
Article in Russian

For citation:
Salakhutdinova K.I., Lebedev I.S., Krivtsova I.E. Gradient boosting trees method in the task of software identification. Scientific and Technical Journal of Information Technologies, Mechanics and Optics , 2018, vol. 18, no. 6, pp. 1016–1022 (in Russian). doi: 10.17586/2226-1494-2018-18-6-1016-1022


Abstract
Subject of Research.The paper proposes an approach to the use of gradient boosted decision trees algorithm. For this purpose, CatBoost algorithm developed by Yandex is proposed. Its implementation is aimed at the problem solution of OS Linux software identification in order to reduce the number of system vulnerabilities, which occur due to the installation of unauthorized software by automated system users. We consider an approach to the program signatures formation and further training of CatBoostClassifier classifier model. The subsequent recognition task is set for the identified programs that were not previously involved in the model training process. Method. Free CatBoost software was used for implementation of the gradient boosted decision trees algorithm. CatBoostClassifier multi-classification model was created on its basis. The use of this model allows identifying test sample elf-files.Main Results. The training parameters of the classification model are selected. An experiment is carried out to identify elf-files with the use of ten different featuresof emerging signature programs. The results obtained in the new approach are compared with the results of the previously developed method of identification based on the application of the statistical criterion of Chi-square homogeneity at the significance level p = 0.01. Practical Relevance. The results of the study can be recommended to information security specialists for data media audit. The developed approach gives the possibility to identify violations of the established security policy in the processing of confidential information.

Keywords: machine learning, gradient boosting trees, CatBoost, executable files identification, elf-files, information security

Acknowledgements. Work have been conducted with theme № 0073-2018-0008.

References
  1. Pektas A., Acarman T. Classification of malware families based on runtime behaviors. Journal of Information Security and Applications, 2017, vol. 37, pp. 91–100. doi: 10.1016/j.jisa.2017.10.005
  2. Nguyen M.H., Nguyen D.L., Nguyen X.M., Quan T.T. Auto-detection of sophisticated malware using lazy-binding
    control flow graph and deep learning. Computers & Security, 2018, vol. 76, pp. 128–155. doi: 10.1016/j.cose.2018.02.006
  3. Chiba Z., Abghour N., Moussaid K., El Omri A., Rida M. A novel architecture combined with optimal parameters for back propagation neural networks applied to anomaly network
    intrusion detection. Computers & Security, 2018, vol. 75, pp. 36–58. doi: 10.1016/j.cose.2018.01.023
  4. Gorbunov I.V. Features of a fuzzy classifier use and machine learning algorithms for authentication using keyboard handwriting. Elektronnye Sredstva i Sistemy Upravleniya, 2013, no. 2,
    pp. 13–18. (in Russian)
  5. Gori M. Machine Learning: A Constraint-Based Approach. Morgan Kaufmann, 2017, 580 p.
  6. Krivtsova I.E., Salakhutdinova K.I., Yurin I.V. Method of
    executable filts identification by their signatures. Vestnik Gosudarstvennogo Universiteta Morskogo i Rechnogo Flota Imeni Admirala S.O. Makarova, 2016, no. 1, pp. 215–224.
    (in Russian)
  7. Krivtsova I.E., Lebedev I.S., Salakhutdinova K.I. Identification of executable files on the basis of statistical criteria. Proc. 20th Conference of Open Innovations Association. St. Petersburg, 2017, pp. 202–208. doi: 10.23919/FRUCT.2017.8071312
  8. Antonov A.E., Fedulov A.S. File type identification based on structural analysis. Journal of Applied Informatics, 2013, no. 2, pp. 68–77. (in Russian)
  9. Kazarin O.V. Theory and Practice of Program Protection. Moscow, MGUL Publ., 2004, 450 p. (in Russian)
  10. Kaftannikov I.L., Parasich A.V. Decision tree’s features of application in classification problem. Bulletin SUSU, Computer Technologies, Automatic Control & Radioelectronics, 2015, no. 3, pp. 26–32. (in Russian)
  11. Freund, Y., Schapire R. Experiments with a new boosting algorithm. Proc. 13th Int. Conf. on Machine Learning. Bari, 1996, pp. 148–156.
  12. Druzhkov P.N., Zolotykh N.Yu., Polovinkin A.N. Parallel implementation of prediction algorithm in gradient boosting trees method. Bulletin SUSU,2011, no. 37, pp. 82–89. (in Russian)
  13. CatBoost GitHub. Available at: https://github.com/catboost(accessed 29.04.2018).
  14. Salakhutdinova K.I., Lebedev I.S., Krivtsova I.E. Informative feature selection in software identification task. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2018, vol. 18, no. 2, pp. 278–285 (in Russian).
    doi: 10.17586/2226-1494-2018-18-2-278-285
  15. Druzhinin N.K., Salakhutdinova K.I. Identification of executable file by dint of individual feature. Proc. Int. Conf. on Information Security and Protection of Information Technology, ISPIT-2015. St. Petersburg, Russia, 2015, pp. 45–47.


Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License
Copyright 2001-2024 ©
Scientific and Technical Journal
of Information Technologies, Mechanics and Optics.
All rights reserved.

Яндекс.Метрика