doi: 10.17586/2226-1494-2025-25-1-114-127


Enhancing and extending CatBoost for accurate detection and classification of DoS and DDoS attack subtypes in network traffic

A. Hajjouz, E. Y. Avksentieva


Read the full article  ';
Article in English

For citation:
Hajjouz A., Avksentieva E.Yu. Enhancing and extending CatBoost for accurate detection and classification of DoS and DDoS attack subtypes in network traffic. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2025, vol. 25, no. 1, pp. 114–127. doi: 10.17586/2226-1494-2025-25-1-114-127


Abstract
In the ever changing digital world, the rise of sophisticated cyber threats, especially DoS and DDoS attacks, is a big challenge to Information Security. This paper addresses the problem of classifying malicious from benign network traffic using CatBoost classifier, a machine learning algorithm optimized for categorical data and imbalanced datasets. We used CIC-IDS2017 and CSE-CIC-IDS2018 datasets which simulate various cyberattack scenarios, our research optimized CatBoost to identify specific subtypes of DoS and DDoS attacks including Hulk, SlowHTTPTest, GoldenEye, Slowloris, HOIC, LOIC-UDP-HTTP, LOIT. The methodology involved data preparation, feature selection and model configuration, normalizing outliers, correcting negative values, and refining dataset structures. Stratified sampling ensured a balanced representation of classes in training, validation, and testing sets. The CatBoost model performed well with overall accuracy of 0.999922, high precision, recall, and F1-scores across all categories, and it can process over 3.4 million samples per second. These results show the model is robust and reliable for real-time intrusion detection. By classifying specific attack types, our model improves the precision of the Intrusion Detection Systems (IDS) and allows for targeted response to different threats. The big gain in detection accuracy solves the problem of imbalanced datasets and the need for granular attack types detection. Use CatBoost in advanced Information Security frameworks for critical infrastructure, cloud services, and enterprise networks to defend against digital threats. This paper provides a fast, accurate and scalable solution for network IDS and shows the importance of custom machine learning models in Information Security. Future work should explore CatBoost on more datasets and integrate it with other machine learning techniques to improve robustness and detection.

Keywords: information security, network intrusion detection, DoS attacks, DDoS attacks, machine learning, real-time detection, feature selection, model optimization

References
  1. Hajjouz A., Avksentieva E.Y. An approach to configuring CatBoost for advanced detection of DoS and DDoS attacks in network traffic. Vestnik of Astrakhan State Technical University. Series: Management, computer science and informatics, 2024, vol. 2024, no. 3. pp. 64–74. https://doi.org/10.24143/2072-9502-2024-3-65-74
  2. Zhou L., Zhu Y., Zong T, Xiang Y. A feature selection-based method for DDoS attack flow classification. Future Generation Computer Systems, 2022, vol. 132, pp. 67–79. https://doi.org/10.1016/j.future.2022.02.006
  3. Eliyan L.F., Di Pietro R. DoS and DDoS attacks in Software Defined Networks: A survey of existing solutions and research challenges. Future Generation Computer Systems, 2021, vol. 122, pp. 149–171. https://doi.org/10.1016/j.future.2021.03.011
  4. Ignatev N.A., Tursunmurotov D.X. Censoring training samples using regularization of connectivity relations of class objects. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2024, vol. 24, no. 2, pp. 322–329. (in Russian). https://doi.org/10.17586/2226-1494-2024-24-2-322-329
  5. Alhijawi B., Almajali S., Elgala H., Salameh H.B., Ayyash M. A survey on DoS/DDoS mitigation techniques in SDNs: Classification, comparison, solutions, testing tools and datasets. Computers and Electrical Engineering, 2022, vol. 99, pp. 107706. https://doi.org/10.1016/j.compeleceng.2022.107706
  6. Li Y., Liu Q. A comprehensive review study of cyber-attacks and cyber security; Emerging trends and recent developments. Energy Reports, 2021, vol. 7, pp. 8176–8186. https://doi.org/10.1016/j.egyr.2021.08.126
  7. Karatas G., Demir O., Sahingoz O.K. Increasing the performance of machine learning-based IDSs on an imbalanced and up-to-date dataset. IEEE Access, 2020, vol. 8. pp. 32150–32162. https://doi.org/10.1109/ACCESS.2020.2973219
  8. Kim J., Kim J., Kim H., Shim M., Choi E. CNN-based network intrusion detection against denial-of-service attacks. Electronics, 2020, vol. 9, no. 6, pp. 916. https://doi.org/10.3390/electronics9060916
  9. Dora V.R.S., Lakshmi V.N. Optimal feature selection with CNN-feature learning for DDoS attack detection using meta-heuristic-based LSTM. International Journal of Intelligent Robotics and Applications, 2022, vol. 6, no. 2, pp. 323–349. https://doi.org/10.1007/s41315-022-00224-4
  10. Abu Bakar R., Huang X., Javed M.S., Hussain S., Majeed M.F. An intelligent agent-based detection system for DDoS attacks using automatic feature extraction and selection. Sensors, 2023, vol. 23, no. 6, pp. 3333. https://doi.org/10.3390/s23063333
  11. Farhat S., Abdelkader M., Meddeb-Makhlouf A., Zarai F. Evaluation of DoS/DDoS Attack Detection with ML Techniques on CIC-IDS2017 Dataset. Proc. of the 9th International Conference on Information Systems Security and Privacy ICISSP, 2023, vol. 1, pp. 287–295. https://doi.org/10.5220/0011605700003405
  12. Manimurugan S., Al-Mutairi S., Aborokbah M.M., Chilamkurti N., Ganesan S., Patan R. Effective attack detection in internet of medical things smart environment using a deep belief neural network. IEEE Access, 2020, vol. 8, pp. 77396–77404. https://doi.org/10.1109/ACCESS.2020.2986013
  13. Rios V.D.M., Inácio P.R., Magoni D., Freire M.M. Detection and mitigation of low-rate denial-of-service attacks: A survey. IEEE Access, 2022, vol. 10, pp. 76648–76668. https://doi.org/10.1109/ACCESS.2022.3191430
  14. Faria V.D.S., Gonçalves J.A., Silva C.A.M.D., Vieira G.D.B., Mascarenhas D.M. SDToW: a slowloris detecting tool for WMNs. Information, 2020, vol. 11, no. 12. pp. 544. https://doi.org/10.3390/info11120544
  15. Mahjabin S. Implementation of DoS and DDoS attacks on cloud servers. Periodicals of Engineering and Natural Sciences, 2018, vol. 6, no. 2, pp. 148–158. https://doi.org/10.21533/pen.v6i2.170
  16. Kshirsagar D., Kumar S. An ontology approach for proactive detection of HTTP flood DoS attack. International Journal of System Assurance Engineering and Management, 2023, vol. 14, suppl. 3, pp. 840–847. https://doi.org/10.1007/s13198-021-01170-3
  17. Cai Y.X., Chen S.C., Wang C.C. An Implementation of feature selection for detecting LOIC-based DDoS attack. Proc. of the International Conference on Consumer Electronics - Taiwan (ICCE-Taiwan), 2023, pp. 607–608. https://doi.org/10.1109/ICCE-Taiwan58799.2023.10226733
  18. Nayyar S., Arora S., Singh M. Recurrent neural network based intrusion detection system. Proc. of the International Conference on Communication and Signal Processing (ICCSP), 2020, pp. 136–140. https://doi.org/10.1109/ICCSP48568.2020.9182099
  19. Hajjouz A., Avksentieva E. Evaluating the effectiveness of the CatBoost classifier in distinguishing benign traffic, FTP BruteForce and SSH BruteForce traffic. Proc. of the 9th International Conference on Signal and Image Processing (ICSIP), 2024, pp. 351-358. https://doi.org/10.1109/ICSIP61881.2024.10671552
  20. Sharafaldin I., Lashkari A.H., Ghorbani A.A. Toward generating a new intrusion detection dataset and intrusion traffic characterization. Proc. of the 4th International Conference on Information Systems Security and Privacy (ICISSP), 2018, vol. 1, pp. 108–116. https://doi.org/10.5220/0006639801080116
  21. Cabello-Solorzano K., Ortigosa de Araujo I., Peña M., Correia, L., Tallón-Ballesteros A.J., The impact of data normalization on the accuracy of machine learning algorithms: a comparative analysis. Lecture Notes in Networks and Systems, 2023, vol. 750, pp. 344–353. https://doi.org/10.1007/978-3-031-42536-3_33
  22. Oleghe O. A predictive noise correction methodology for manufacturing process datasets. Journal of Big Data, 2020, vol. 7, no. 1, pp. 89. https://doi.org/10.1186/s40537-020-00367-w
  23. Umar M.A., Chen Z., Shuaib K., Liu Y. Effects of feature selection and normalization on network intrusion detection. Data Science and Management, 2025, vol. 8, no. 1, pp. 23-39. https://doi.org/10.1016/j.dsm.2024.08.001
  24. Chandrashekar G., Sahin F. A survey on feature selection methods. Computers & electrical engineering, 2014, vol. 40, no. 1, pp. 16–28. https://doi.org/10.1016/j.compeleceng.2013.11.024
  25. Palo H.K., Sahoo S., Subudhi A.K. Dimensionality reduction techniques: Principles, benefits, and limitations. Data Analytics in Bioinformatics: A Machine Learning Perspective, 2021, pp. 79–107. https://doi.org/10.1002/9781119785620.ch4
  26. Dunn J., Mingardi L., Zhuo Y.D. Comparing interpretability and explainability for feature selection. arXiv, 2021, arXiv:2105.05328. https://doi.org/10.48550/arXiv.2105.05328
  27. Li J., Cheng K., Wang S., Morstatter F., Trevino R.P., Tang J., Liu H. Feature selection: A data perspective. ACM computing surveys, 2017, vol. 50, no. 6, pp. 1–45. https://doi.org/10.1145/3136625
  28. Kathiravan P., Shanmugavadivu P., Saranya R. Mitigating imbalanced data in online social networks using Stratified K-Means Sampling. Proc. of the 8th International Conference on Business and Industrial Research (ICBIR), 2023, pp. 883–888. https://doi.org/10.1109/ICBIR57571.2023.10147677
  29. Qi J., Ko T.W., Wood B.C., Pham T.A., Ong S.P. Robust training of machine learning interatomic potentials with dimensionality reduction and stratified sampling. npj Computational Materials, 2024, vol. 10, no. 1, pp. 43. https://doi.org/10.1038/s41524-024-01227-4
  30. Siblini W., Fréry J., He-Guelton L., Oblé F., Wang Y.Q. Master your metrics with calibration. Lecture Notes in Computer Science, 2020, vol. 12080, pp. 457–469. https://doi.org/10.1007/978-3-030-44584-3_36
  31. Salakhutdinova K.I., Lebedev I.S., Krivtsova I.E. Gradient boosting trees method in the task of software identification. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2018, vol. 18, no. 6, pp. 1016–1022. (in Russian). https://doi.org/10.17586/2226-1494-2018-18-6-1016-1022
  32. Prokhorenkova L., Gusev G., Vorobev A., Dorogush A. V., Gulin A. CatBoost: unbiased boosting with categorical features. Proc. of the Advances in neural information processing systems 31 (NeurlPS 2018). 2018. 
  33. Dorogush A.V., Gulin A., Gusev G., Kazeev N., Prokhorenkova L.O., Vorobev A. 2017. Fighting biases with dynamic boosting. arXiv, 2017, arXiv:1706.09516. https://doi.org/10.48550/arXiv.1706.09516
  34. Dorogush A.V., Ershov V., Gulin A. CatBoost: gradient boosting with categorical features support. arXiv, 2018, arXiv:1810.11363. https://doi.org/10.48550/arXiv.1810.11363
  35. Ami A.S., Moran K., Poshyvanyk D., Nadkarni A. «False negative-that one is going to kill you»: Understanding Industry Perspectives of Static Analysis based Security Testing. Proc. of the IEEE Symposium on Security and Privacy (SP), 2024, pp. 3979–3997. https://doi.org/10.1109/SP54263.2024.00019
  36. Heydarian M., Doyle T.E., Samavi R., MLCM: Multi-label confusion matrix. IEEE Access, 2022, vol. 10, pp. 19083–19095. https://doi.org/10.1109/ACCESS.2022.3151048
  37. Chicco D., Jurman G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC genomics. 2020, vol. 21, no. 1, pp. 1–13. https://doi.org/10.1186/s12864-019-6413-7
  38. Bowen D., Ungar L. Generalized SHAP: Generating multiple types of explanations in machine learning. arXiv, 2020, arXiv:2006.07155. https://doi.org/10.48550/arXiv.2006.07155
  39. Lee Y.G., Oh J.Y., Kim D., Kim G. SHAP value-based feature importance analysis for short-term load forecasting. Journal of Electrical Engineering & Technology, 2023, vol. 18, no. 1, pp. 579–588. https://doi.org/10.1007/s42835-022-01161-9
  40. Hamilton R.I., Papadopoulos P.N. Using SHAP values and machine learning to understand trends in the transient stability limit. IEEE Transactions on Power Systems, 2023, vol. 39, no. 1, pp. 1384–1397. https://doi.org/10.1109/TPWRS.2023.3248941
  41. Berrar D. Cross-validation. Encyclopedia of Bioinformatics and Computational Biology, 2019, vol.1, pp. 542-545. https://doi.org/10.1016/B978-0-12-809633-8.20349-X
  42. Tougui I., Jilbab A., El Mhamdi J. Impact of the choice of cross-validation techniques on the results of machine learning-based diagnostic applications. Healthcare informatics research, 2021, vol. 27, no. 3, pp. 189–199. https://doi.org/10.4258/hir.2021.27.3.189


Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License
Copyright 2001-2025 ©
Scientific and Technical Journal
of Information Technologies, Mechanics and Optics.
All rights reserved.

Яндекс.Метрика