Menu
Publications
2025
2024
2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
Editor-in-Chief

Nikiforov
Vladimir O.
D.Sc., Prof.
Partners
doi: 10.17586/2226-1494-2025-25-1-114-127
Enhancing and extending CatBoost for accurate detection and classification of DoS and DDoS attack subtypes in network traffic
Read the full article

Article in English
For citation:
Abstract
For citation:
Hajjouz A., Avksentieva E.Yu. Enhancing and extending CatBoost for accurate detection and classification of DoS and DDoS attack subtypes in network traffic. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2025, vol. 25, no. 1, pp. 114–127. doi: 10.17586/2226-1494-2025-25-1-114-127
Abstract
In the ever changing digital world, the rise of sophisticated cyber threats, especially DoS and DDoS attacks, is a big challenge to Information Security. This paper addresses the problem of classifying malicious from benign network traffic using CatBoost classifier, a machine learning algorithm optimized for categorical data and imbalanced datasets. We used CIC-IDS2017 and CSE-CIC-IDS2018 datasets which simulate various cyberattack scenarios, our research optimized CatBoost to identify specific subtypes of DoS and DDoS attacks including Hulk, SlowHTTPTest, GoldenEye, Slowloris, HOIC, LOIC-UDP-HTTP, LOIT. The methodology involved data preparation, feature selection and model configuration, normalizing outliers, correcting negative values, and refining dataset structures. Stratified sampling ensured a balanced representation of classes in training, validation, and testing sets. The CatBoost model performed well with overall accuracy of 0.999922, high precision, recall, and F1-scores across all categories, and it can process over 3.4 million samples per second. These results show the model is robust and reliable for real-time intrusion detection. By classifying specific attack types, our model improves the precision of the Intrusion Detection Systems (IDS) and allows for targeted response to different threats. The big gain in detection accuracy solves the problem of imbalanced datasets and the need for granular attack types detection. Use CatBoost in advanced Information Security frameworks for critical infrastructure, cloud services, and enterprise networks to defend against digital threats. This paper provides a fast, accurate and scalable solution for network IDS and shows the importance of custom machine learning models in Information Security. Future work should explore CatBoost on more datasets and integrate it with other machine learning techniques to improve robustness and detection.
Keywords: information security, network intrusion detection, DoS attacks, DDoS attacks, machine learning, real-time detection, feature selection, model optimization
References
References
- Hajjouz A., Avksentieva E.Y. An approach to configuring CatBoost for advanced detection of DoS and DDoS attacks in network traffic. Vestnik of Astrakhan State Technical University. Series: Management, computer science and informatics, 2024, vol. 2024, no. 3. pp. 64–74. https://doi.org/10.24143/2072-9502-2024-3-65-74
- Zhou L., Zhu Y., Zong T, Xiang Y. A feature selection-based method for DDoS attack flow classification. Future Generation Computer Systems, 2022, vol. 132, pp. 67–79. https://doi.org/10.1016/j.future.2022.02.006
- Eliyan L.F., Di Pietro R. DoS and DDoS attacks in Software Defined Networks: A survey of existing solutions and research challenges. Future Generation Computer Systems, 2021, vol. 122, pp. 149–171. https://doi.org/10.1016/j.future.2021.03.011
- Ignatev N.A., Tursunmurotov D.X. Censoring training samples using regularization of connectivity relations of class objects. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2024, vol. 24, no. 2, pp. 322–329. (in Russian). https://doi.org/10.17586/2226-1494-2024-24-2-322-329
- Alhijawi B., Almajali S., Elgala H., Salameh H.B., Ayyash M. A survey on DoS/DDoS mitigation techniques in SDNs: Classification, comparison, solutions, testing tools and datasets. Computers and Electrical Engineering, 2022, vol. 99, pp. 107706. https://doi.org/10.1016/j.compeleceng.2022.107706
- Li Y., Liu Q. A comprehensive review study of cyber-attacks and cyber security; Emerging trends and recent developments. Energy Reports, 2021, vol. 7, pp. 8176–8186. https://doi.org/10.1016/j.egyr.2021.08.126
- Karatas G., Demir O., Sahingoz O.K. Increasing the performance of machine learning-based IDSs on an imbalanced and up-to-date dataset. IEEE Access, 2020, vol. 8. pp. 32150–32162. https://doi.org/10.1109/ACCESS.2020.2973219
- Kim J., Kim J., Kim H., Shim M., Choi E. CNN-based network intrusion detection against denial-of-service attacks. Electronics, 2020, vol. 9, no. 6, pp. 916. https://doi.org/10.3390/electronics9060916
- Dora V.R.S., Lakshmi V.N. Optimal feature selection with CNN-feature learning for DDoS attack detection using meta-heuristic-based LSTM. International Journal of Intelligent Robotics and Applications, 2022, vol. 6, no. 2, pp. 323–349. https://doi.org/10.1007/s41315-022-00224-4
- Abu Bakar R., Huang X., Javed M.S., Hussain S., Majeed M.F. An intelligent agent-based detection system for DDoS attacks using automatic feature extraction and selection. Sensors, 2023, vol. 23, no. 6, pp. 3333. https://doi.org/10.3390/s23063333
- Farhat S., Abdelkader M., Meddeb-Makhlouf A., Zarai F. Evaluation of DoS/DDoS Attack Detection with ML Techniques on CIC-IDS2017 Dataset. Proc. of the 9th International Conference on Information Systems Security and Privacy ICISSP, 2023, vol. 1, pp. 287–295. https://doi.org/10.5220/0011605700003405
- Manimurugan S., Al-Mutairi S., Aborokbah M.M., Chilamkurti N., Ganesan S., Patan R. Effective attack detection in internet of medical things smart environment using a deep belief neural network. IEEE Access, 2020, vol. 8, pp. 77396–77404. https://doi.org/10.1109/ACCESS.2020.2986013
- Rios V.D.M., Inácio P.R., Magoni D., Freire M.M. Detection and mitigation of low-rate denial-of-service attacks: A survey. IEEE Access, 2022, vol. 10, pp. 76648–76668. https://doi.org/10.1109/ACCESS.2022.3191430
- Faria V.D.S., Gonçalves J.A., Silva C.A.M.D., Vieira G.D.B., Mascarenhas D.M. SDToW: a slowloris detecting tool for WMNs. Information, 2020, vol. 11, no. 12. pp. 544. https://doi.org/10.3390/info11120544
- Mahjabin S. Implementation of DoS and DDoS attacks on cloud servers. Periodicals of Engineering and Natural Sciences, 2018, vol. 6, no. 2, pp. 148–158. https://doi.org/10.21533/pen.v6i2.170
- Kshirsagar D., Kumar S. An ontology approach for proactive detection of HTTP flood DoS attack. International Journal of System Assurance Engineering and Management, 2023, vol. 14, suppl. 3, pp. 840–847. https://doi.org/10.1007/s13198-021-01170-3
- Cai Y.X., Chen S.C., Wang C.C. An Implementation of feature selection for detecting LOIC-based DDoS attack. Proc. of the International Conference on Consumer Electronics - Taiwan (ICCE-Taiwan), 2023, pp. 607–608. https://doi.org/10.1109/ICCE-Taiwan58799.2023.10226733
- Nayyar S., Arora S., Singh M. Recurrent neural network based intrusion detection system. Proc. of the International Conference on Communication and Signal Processing (ICCSP), 2020, pp. 136–140. https://doi.org/10.1109/ICCSP48568.2020.9182099
- Hajjouz A., Avksentieva E. Evaluating the effectiveness of the CatBoost classifier in distinguishing benign traffic, FTP BruteForce and SSH BruteForce traffic. Proc. of the 9th International Conference on Signal and Image Processing (ICSIP), 2024, pp. 351-358. https://doi.org/10.1109/ICSIP61881.2024.10671552
- Sharafaldin I., Lashkari A.H., Ghorbani A.A. Toward generating a new intrusion detection dataset and intrusion traffic characterization. Proc. of the 4th International Conference on Information Systems Security and Privacy (ICISSP), 2018, vol. 1, pp. 108–116. https://doi.org/10.5220/0006639801080116
- Cabello-Solorzano K., Ortigosa de Araujo I., Peña M., Correia, L., Tallón-Ballesteros A.J., The impact of data normalization on the accuracy of machine learning algorithms: a comparative analysis. Lecture Notes in Networks and Systems, 2023, vol. 750, pp. 344–353. https://doi.org/10.1007/978-3-031-42536-3_33
- Oleghe O. A predictive noise correction methodology for manufacturing process datasets. Journal of Big Data, 2020, vol. 7, no. 1, pp. 89. https://doi.org/10.1186/s40537-020-00367-w
- Umar M.A., Chen Z., Shuaib K., Liu Y. Effects of feature selection and normalization on network intrusion detection. Data Science and Management, 2025, vol. 8, no. 1, pp. 23-39. https://doi.org/10.1016/j.dsm.2024.08.001
- Chandrashekar G., Sahin F. A survey on feature selection methods. Computers & electrical engineering, 2014, vol. 40, no. 1, pp. 16–28. https://doi.org/10.1016/j.compeleceng.2013.11.024
- Palo H.K., Sahoo S., Subudhi A.K. Dimensionality reduction techniques: Principles, benefits, and limitations. Data Analytics in Bioinformatics: A Machine Learning Perspective, 2021, pp. 79–107. https://doi.org/10.1002/9781119785620.ch4
- Dunn J., Mingardi L., Zhuo Y.D. Comparing interpretability and explainability for feature selection. arXiv, 2021, arXiv:2105.05328. https://doi.org/10.48550/arXiv.2105.05328
- Li J., Cheng K., Wang S., Morstatter F., Trevino R.P., Tang J., Liu H. Feature selection: A data perspective. ACM computing surveys, 2017, vol. 50, no. 6, pp. 1–45. https://doi.org/10.1145/3136625
- Kathiravan P., Shanmugavadivu P., Saranya R. Mitigating imbalanced data in online social networks using Stratified K-Means Sampling. Proc. of the 8th International Conference on Business and Industrial Research (ICBIR), 2023, pp. 883–888. https://doi.org/10.1109/ICBIR57571.2023.10147677
- Qi J., Ko T.W., Wood B.C., Pham T.A., Ong S.P. Robust training of machine learning interatomic potentials with dimensionality reduction and stratified sampling. npj Computational Materials, 2024, vol. 10, no. 1, pp. 43. https://doi.org/10.1038/s41524-024-01227-4
- Siblini W., Fréry J., He-Guelton L., Oblé F., Wang Y.Q. Master your metrics with calibration. Lecture Notes in Computer Science, 2020, vol. 12080, pp. 457–469. https://doi.org/10.1007/978-3-030-44584-3_36
- Salakhutdinova K.I., Lebedev I.S., Krivtsova I.E. Gradient boosting trees method in the task of software identification. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2018, vol. 18, no. 6, pp. 1016–1022. (in Russian). https://doi.org/10.17586/2226-1494-2018-18-6-1016-1022
- Prokhorenkova L., Gusev G., Vorobev A., Dorogush A. V., Gulin A. CatBoost: unbiased boosting with categorical features. Proc. of the Advances in neural information processing systems 31 (NeurlPS 2018). 2018.
- Dorogush A.V., Gulin A., Gusev G., Kazeev N., Prokhorenkova L.O., Vorobev A. 2017. Fighting biases with dynamic boosting. arXiv, 2017, arXiv:1706.09516. https://doi.org/10.48550/arXiv.1706.09516
- Dorogush A.V., Ershov V., Gulin A. CatBoost: gradient boosting with categorical features support. arXiv, 2018, arXiv:1810.11363. https://doi.org/10.48550/arXiv.1810.11363
- Ami A.S., Moran K., Poshyvanyk D., Nadkarni A. «False negative-that one is going to kill you»: Understanding Industry Perspectives of Static Analysis based Security Testing. Proc. of the IEEE Symposium on Security and Privacy (SP), 2024, pp. 3979–3997. https://doi.org/10.1109/SP54263.2024.00019
- Heydarian M., Doyle T.E., Samavi R., MLCM: Multi-label confusion matrix. IEEE Access, 2022, vol. 10, pp. 19083–19095. https://doi.org/10.1109/ACCESS.2022.3151048
- Chicco D., Jurman G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC genomics. 2020, vol. 21, no. 1, pp. 1–13. https://doi.org/10.1186/s12864-019-6413-7
- Bowen D., Ungar L. Generalized SHAP: Generating multiple types of explanations in machine learning. arXiv, 2020, arXiv:2006.07155. https://doi.org/10.48550/arXiv.2006.07155
- Lee Y.G., Oh J.Y., Kim D., Kim G. SHAP value-based feature importance analysis for short-term load forecasting. Journal of Electrical Engineering & Technology, 2023, vol. 18, no. 1, pp. 579–588. https://doi.org/10.1007/s42835-022-01161-9
- Hamilton R.I., Papadopoulos P.N. Using SHAP values and machine learning to understand trends in the transient stability limit. IEEE Transactions on Power Systems, 2023, vol. 39, no. 1, pp. 1384–1397. https://doi.org/10.1109/TPWRS.2023.3248941
- Berrar D. Cross-validation. Encyclopedia of Bioinformatics and Computational Biology, 2019, vol.1, pp. 542-545. https://doi.org/10.1016/B978-0-12-809633-8.20349-X
- Tougui I., Jilbab A., El Mhamdi J. Impact of the choice of cross-validation techniques on the results of machine learning-based diagnostic applications. Healthcare informatics research, 2021, vol. 27, no. 3, pp. 189–199. https://doi.org/10.4258/hir.2021.27.3.189