Menu
Publications
2024
2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
Editor-in-Chief
Nikiforov
Vladimir O.
D.Sc., Prof.
Partners
doi: 10.17586/2226-1494-2018-18-6-1016-1022
GRADIENT BOOSTING TREES METHOD IN THE TASK OF SOFTWARE IDENTIFICATION
Read the full article ';
Article in Russian
For citation:
Abstract
For citation:
Salakhutdinova K.I., Lebedev I.S., Krivtsova I.E. Gradient boosting trees method in the task of software identification. Scientific and Technical Journal of Information Technologies, Mechanics and Optics , 2018, vol. 18, no. 6, pp. 1016–1022 (in Russian). doi: 10.17586/2226-1494-2018-18-6-1016-1022
Abstract
Subject of Research.The paper proposes an approach to the use of gradient boosted decision trees algorithm. For this purpose, CatBoost algorithm developed by Yandex is proposed. Its implementation is aimed at the problem solution of OS Linux software identification in order to reduce the number of system vulnerabilities, which occur due to the installation of unauthorized software by automated system users. We consider an approach to the program signatures formation and further training of CatBoostClassifier classifier model. The subsequent recognition task is set for the identified programs that were not previously involved in the model training process. Method. Free CatBoost software was used for implementation of the gradient boosted decision trees algorithm. CatBoostClassifier multi-classification model was created on its basis. The use of this model allows identifying test sample elf-files.Main Results. The training parameters of the classification model are selected. An experiment is carried out to identify elf-files with the use of ten different featuresof emerging signature programs. The results obtained in the new approach are compared with the results of the previously developed method of identification based on the application of the statistical criterion of Chi-square homogeneity at the significance level p = 0.01. Practical Relevance. The results of the study can be recommended to information security specialists for data media audit. The developed approach gives the possibility to identify violations of the established security policy in the processing of confidential information.
Keywords: machine learning, gradient boosting trees, CatBoost, executable files identification, elf-files, information security
Acknowledgements. Work have been conducted with theme № 0073-2018-0008.
References
Acknowledgements. Work have been conducted with theme № 0073-2018-0008.
References
-
Pektas A., Acarman T. Classification of malware families based on runtime behaviors. Journal of Information Security and Applications, 2017, vol. 37, pp. 91–100. doi: 10.1016/j.jisa.2017.10.005
-
Nguyen M.H., Nguyen D.L., Nguyen X.M., Quan T.T. Auto-detection of sophisticated malware using lazy-binding
control flow graph and deep learning. Computers & Security, 2018, vol. 76, pp. 128–155. doi: 10.1016/j.cose.2018.02.006 -
Chiba Z., Abghour N., Moussaid K., El Omri A., Rida M. A novel architecture combined with optimal parameters for back propagation neural networks applied to anomaly network
intrusion detection. Computers & Security, 2018, vol. 75, pp. 36–58. doi: 10.1016/j.cose.2018.01.023 -
Gorbunov I.V. Features of a fuzzy classifier use and machine learning algorithms for authentication using keyboard handwriting. Elektronnye Sredstva i Sistemy Upravleniya, 2013, no. 2,
pp. 13–18. (in Russian) -
Gori M. Machine Learning: A Constraint-Based Approach. Morgan Kaufmann, 2017, 580 p.
-
Krivtsova I.E., Salakhutdinova K.I., Yurin I.V. Method of
executable filts identification by their signatures. Vestnik Gosudarstvennogo Universiteta Morskogo i Rechnogo Flota Imeni Admirala S.O. Makarova, 2016, no. 1, pp. 215–224.
(in Russian) -
Krivtsova I.E., Lebedev I.S., Salakhutdinova K.I. Identification of executable files on the basis of statistical criteria. Proc. 20th Conference of Open Innovations Association. St. Petersburg, 2017, pp. 202–208. doi: 10.23919/FRUCT.2017.8071312
-
Antonov A.E., Fedulov A.S. File type identification based on structural analysis. Journal of Applied Informatics, 2013, no. 2, pp. 68–77. (in Russian)
-
Kazarin O.V. Theory and Practice of Program Protection. Moscow, MGUL Publ., 2004, 450 p. (in Russian)
-
Kaftannikov I.L., Parasich A.V. Decision tree’s features of application in classification problem. Bulletin SUSU, Computer Technologies, Automatic Control & Radioelectronics, 2015, no. 3, pp. 26–32. (in Russian)
-
Freund, Y., Schapire R. Experiments with a new boosting algorithm. Proc. 13th Int. Conf. on Machine Learning. Bari, 1996, pp. 148–156.
-
Druzhkov P.N., Zolotykh N.Yu., Polovinkin A.N. Parallel implementation of prediction algorithm in gradient boosting trees method. Bulletin SUSU,2011, no. 37, pp. 82–89. (in Russian)
-
CatBoost GitHub. Available at: https://github.com/catboost(accessed 29.04.2018).
-
Salakhutdinova K.I., Lebedev I.S., Krivtsova I.E. Informative feature selection in software identification task. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2018, vol. 18, no. 2, pp. 278–285 (in Russian).
doi: 10.17586/2226-1494-2018-18-2-278-285 -
Druzhinin N.K., Salakhutdinova K.I. Identification of executable file by dint of individual feature. Proc. Int. Conf. on Information Security and Protection of Information Technology, ISPIT-2015. St. Petersburg, Russia, 2015, pp. 45–47.