<div>
	Enhancing and extending CatBoost for accurate detection and classification of DoS and DDoS attack subtypes in network traffic</div>

Hajjouz Abdulkader , Avksentieva Elena Yu.

2025 , VOLUME 25, NUMBER 1 ( january-february )

ISSN 2226-1494 (print), ISSN 2500-0373 (online)

Publications

Editor-in-Chief

Nikiforov
Vladimir O.
D.Sc., Prof.

Partners

doi: 10.17586/2226-1494-2025-25-1-114-127

Enhancing and extending CatBoost for accurate detection and classification of DoS and DDoS attack subtypes in network traffic

A. Hajjouz, E. Y. Avksentieva

Read the full article

Article in English

For citation:

Hajjouz A., Avksentieva E.Yu. Enhancing and extending CatBoost for accurate detection and classification of DoS and DDoS attack subtypes in network traffic. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2025, vol. 25, no. 1, pp. 114–127. doi: 10.17586/2226-1494-2025-25-1-114-127

Abstract

In the ever changing digital world, the rise of sophisticated cyber threats, especially DoS and DDoS attacks, is a big challenge to Information Security. This paper addresses the problem of classifying malicious from benign network traffic using CatBoost classifier, a machine learning algorithm optimized for categorical data and imbalanced datasets. We used CIC-IDS2017 and CSE-CIC-IDS2018 datasets which simulate various cyberattack scenarios, our research optimized CatBoost to identify specific subtypes of DoS and DDoS attacks including Hulk, SlowHTTPTest, GoldenEye, Slowloris, HOIC, LOIC-UDP-HTTP, LOIT. The methodology involved data preparation, feature selection and model configuration, normalizing outliers, correcting negative values, and refining dataset structures. Stratified sampling ensured a balanced representation of classes in training, validation, and testing sets. The CatBoost model performed well with overall accuracy of 0.999922, high precision, recall, and F1-scores across all categories, and it can process over 3.4 million samples per second. These results show the model is robust and reliable for real-time intrusion detection. By classifying specific attack types, our model improves the precision of the Intrusion Detection Systems (IDS) and allows for targeted response to different threats. The big gain in detection accuracy solves the problem of imbalanced datasets and the need for granular attack types detection. Use CatBoost in advanced Information Security frameworks for critical infrastructure, cloud services, and enterprise networks to defend against digital threats. This paper provides a fast, accurate and scalable solution for network IDS and shows the importance of custom machine learning models in Information Security. Future work should explore CatBoost on more datasets and integrate it with other machine learning techniques to improve robustness and detection.

Keywords: information security, network intrusion detection, DoS attacks, DDoS attacks, machine learning, real-time detection, feature selection, model optimization

References

Hajjouz A., Avksentieva E.Y. An approach to configuring CatBoost for advanced detection of DoS and DDoS attacks in network traffic. Vestnik of Astrakhan State Technical University. Series: Management, computer science and informatics, 2024, vol. 2024, no. 3. pp. 64–74. https://doi.org/10.24143/2072-9502-2024-3-65-74
Zhou L., Zhu Y., Zong T, Xiang Y. A feature selection-based method for DDoS attack flow classification. Future Generation Computer Systems, 2022, vol. 132, pp. 67–79. https://doi.org/10.1016/j.future.2022.02.006
Eliyan L.F., Di Pietro R. DoS and DDoS attacks in Software Defined Networks: A survey of existing solutions and research challenges. Future Generation Computer Systems, 2021, vol. 122, pp. 149–171. https://doi.org/10.1016/j.future.2021.03.011
Ignatev N.A., Tursunmurotov D.X. Censoring training samples using regularization of connectivity relations of class objects. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2024, vol. 24, no. 2, pp. 322–329. (in Russian). https://doi.org/10.17586/2226-1494-2024-24-2-322-329
Alhijawi B., Almajali S., Elgala H., Salameh H.B., Ayyash M. A survey on DoS/DDoS mitigation techniques in SDNs: Classification, comparison, solutions, testing tools and datasets. Computers and Electrical Engineering, 2022, vol. 99, pp. 107706. https://doi.org/10.1016/j.compeleceng.2022.107706
Li Y., Liu Q. A comprehensive review study of cyber-attacks and cyber security; Emerging trends and recent developments. Energy Reports, 2021, vol. 7, pp. 8176–8186. https://doi.org/10.1016/j.egyr.2021.08.126
Karatas G., Demir O., Sahingoz O.K. Increasing the performance of machine learning-based IDSs on an imbalanced and up-to-date dataset. IEEE Access, 2020, vol. 8. pp. 32150–32162. https://doi.org/10.1109/ACCESS.2020.2973219
Kim J., Kim J., Kim H., Shim M., Choi E. CNN-based network intrusion detection against denial-of-service attacks. Electronics, 2020, vol. 9, no. 6, pp. 916. https://doi.org/10.3390/electronics9060916
Dora V.R.S., Lakshmi V.N. Optimal feature selection with CNN-feature learning for DDoS attack detection using meta-heuristic-based LSTM. International Journal of Intelligent Robotics and Applications, 2022, vol. 6, no. 2, pp. 323–349. https://doi.org/10.1007/s41315-022-00224-4
Abu Bakar R., Huang X., Javed M.S., Hussain S., Majeed M.F. An intelligent agent-based detection system for DDoS attacks using automatic feature extraction and selection. Sensors, 2023, vol. 23, no. 6, pp. 3333. https://doi.org/10.3390/s23063333
Farhat S., Abdelkader M., Meddeb-Makhlouf A., Zarai F. Evaluation of DoS/DDoS Attack Detection with ML Techniques on CIC-IDS2017 Dataset. Proc. of the 9th International Conference on Information Systems Security and Privacy ICISSP, 2023, vol. 1, pp. 287–295. https://doi.org/10.5220/0011605700003405
Manimurugan S., Al-Mutairi S., Aborokbah M.M., Chilamkurti N., Ganesan S., Patan R. Effective attack detection in internet of medical things smart environment using a deep belief neural network. IEEE Access, 2020, vol. 8, pp. 77396–77404. https://doi.org/10.1109/ACCESS.2020.2986013
Rios V.D.M., Inácio P.R., Magoni D., Freire M.M. Detection and mitigation of low-rate denial-of-service attacks: A survey. IEEE Access, 2022, vol. 10, pp. 76648–76668. https://doi.org/10.1109/ACCESS.2022.3191430
Faria V.D.S., Gonçalves J.A., Silva C.A.M.D., Vieira G.D.B., Mascarenhas D.M. SDToW: a slowloris detecting tool for WMNs. Information, 2020, vol. 11, no. 12. pp. 544. https://doi.org/10.3390/info11120544
Mahjabin S. Implementation of DoS and DDoS attacks on cloud servers. Periodicals of Engineering and Natural Sciences, 2018, vol. 6, no. 2, pp. 148–158. https://doi.org/10.21533/pen.v6i2.170
Kshirsagar D., Kumar S. An ontology approach for proactive detection of HTTP flood DoS attack. International Journal of System Assurance Engineering and Management, 2023, vol. 14, suppl. 3, pp. 840–847. https://doi.org/10.1007/s13198-021-01170-3
Cai Y.X., Chen S.C., Wang C.C. An Implementation of feature selection for detecting LOIC-based DDoS attack. Proc. of the International Conference on Consumer Electronics - Taiwan (ICCE-Taiwan), 2023, pp. 607–608. https://doi.org/10.1109/ICCE-Taiwan58799.2023.10226733
Nayyar S., Arora S., Singh M. Recurrent neural network based intrusion detection system. Proc. of the International Conference on Communication and Signal Processing (ICCSP), 2020, pp. 136–140. https://doi.org/10.1109/ICCSP48568.2020.9182099
Hajjouz A., Avksentieva E. Evaluating the effectiveness of the CatBoost classifier in distinguishing benign traffic, FTP BruteForce and SSH BruteForce traffic. Proc. of the 9th International Conference on Signal and Image Processing (ICSIP), 2024, pp. 351-358. https://doi.org/10.1109/ICSIP61881.2024.10671552
Sharafaldin I., Lashkari A.H., Ghorbani A.A. Toward generating a new intrusion detection dataset and intrusion traffic characterization. Proc. of the 4th International Conference on Information Systems Security and Privacy (ICISSP), 2018, vol. 1, pp. 108–116. https://doi.org/10.5220/0006639801080116
Cabello-Solorzano K., Ortigosa de Araujo I., Peña M., Correia, L., Tallón-Ballesteros A.J., The impact of data normalization on the accuracy of machine learning algorithms: a comparative analysis. Lecture Notes in Networks and Systems, 2023, vol. 750, pp. 344–353. https://doi.org/10.1007/978-3-031-42536-3_33
Oleghe O. A predictive noise correction methodology for manufacturing process datasets. Journal of Big Data, 2020, vol. 7, no. 1, pp. 89. https://doi.org/10.1186/s40537-020-00367-w
Umar M.A., Chen Z., Shuaib K., Liu Y. Effects of feature selection and normalization on network intrusion detection. Data Science and Management, 2025, vol. 8, no. 1, pp. 23-39. https://doi.org/10.1016/j.dsm.2024.08.001
Chandrashekar G., Sahin F. A survey on feature selection methods. Computers & electrical engineering, 2014, vol. 40, no. 1, pp. 16–28. https://doi.org/10.1016/j.compeleceng.2013.11.024
Palo H.K., Sahoo S., Subudhi A.K. Dimensionality reduction techniques: Principles, benefits, and limitations. Data Analytics in Bioinformatics: A Machine Learning Perspective, 2021, pp. 79–107. https://doi.org/10.1002/9781119785620.ch4
Dunn J., Mingardi L., Zhuo Y.D. Comparing interpretability and explainability for feature selection. arXiv, 2021, arXiv:2105.05328. https://doi.org/10.48550/arXiv.2105.05328
Li J., Cheng K., Wang S., Morstatter F., Trevino R.P., Tang J., Liu H. Feature selection: A data perspective. ACM computing surveys, 2017, vol. 50, no. 6, pp. 1–45. https://doi.org/10.1145/3136625
Kathiravan P., Shanmugavadivu P., Saranya R. Mitigating imbalanced data in online social networks using Stratified K-Means Sampling. Proc. of the 8th International Conference on Business and Industrial Research (ICBIR), 2023, pp. 883–888. https://doi.org/10.1109/ICBIR57571.2023.10147677
Qi J., Ko T.W., Wood B.C., Pham T.A., Ong S.P. Robust training of machine learning interatomic potentials with dimensionality reduction and stratified sampling. npj Computational Materials, 2024, vol. 10, no. 1, pp. 43. https://doi.org/10.1038/s41524-024-01227-4
Siblini W., Fréry J., He-Guelton L., Oblé F., Wang Y.Q. Master your metrics with calibration. Lecture Notes in Computer Science, 2020, vol. 12080, pp. 457–469. https://doi.org/10.1007/978-3-030-44584-3_36
Salakhutdinova K.I., Lebedev I.S., Krivtsova I.E. Gradient boosting trees method in the task of software identification. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2018, vol. 18, no. 6, pp. 1016–1022. (in Russian). https://doi.org/10.17586/2226-1494-2018-18-6-1016-1022
Prokhorenkova L., Gusev G., Vorobev A., Dorogush A. V., Gulin A. CatBoost: unbiased boosting with categorical features. Proc. of the Advances in neural information processing systems 31 (NeurlPS 2018). 2018.
Dorogush A.V., Gulin A., Gusev G., Kazeev N., Prokhorenkova L.O., Vorobev A. 2017. Fighting biases with dynamic boosting. arXiv, 2017, arXiv:1706.09516. https://doi.org/10.48550/arXiv.1706.09516
Dorogush A.V., Ershov V., Gulin A. CatBoost: gradient boosting with categorical features support. arXiv, 2018, arXiv:1810.11363. https://doi.org/10.48550/arXiv.1810.11363
Ami A.S., Moran K., Poshyvanyk D., Nadkarni A. «False negative-that one is going to kill you»: Understanding Industry Perspectives of Static Analysis based Security Testing. Proc. of the IEEE Symposium on Security and Privacy (SP), 2024, pp. 3979–3997. https://doi.org/10.1109/SP54263.2024.00019
Heydarian M., Doyle T.E., Samavi R., MLCM: Multi-label confusion matrix. IEEE Access, 2022, vol. 10, pp. 19083–19095. https://doi.org/10.1109/ACCESS.2022.3151048
Chicco D., Jurman G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC genomics. 2020, vol. 21, no. 1, pp. 1–13. https://doi.org/10.1186/s12864-019-6413-7
Bowen D., Ungar L. Generalized SHAP: Generating multiple types of explanations in machine learning. arXiv, 2020, arXiv:2006.07155. https://doi.org/10.48550/arXiv.2006.07155
Lee Y.G., Oh J.Y., Kim D., Kim G. SHAP value-based feature importance analysis for short-term load forecasting. Journal of Electrical Engineering & Technology, 2023, vol. 18, no. 1, pp. 579–588. https://doi.org/10.1007/s42835-022-01161-9
Hamilton R.I., Papadopoulos P.N. Using SHAP values and machine learning to understand trends in the transient stability limit. IEEE Transactions on Power Systems, 2023, vol. 39, no. 1, pp. 1384–1397. https://doi.org/10.1109/TPWRS.2023.3248941
Berrar D. Cross-validation. Encyclopedia of Bioinformatics and Computational Biology, 2019, vol.1, pp. 542-545. https://doi.org/10.1016/B978-0-12-809633-8.20349-X
Tougui I., Jilbab A., El Mhamdi J. Impact of the choice of cross-validation techniques on the results of machine learning-based diagnostic applications. Healthcare informatics research, 2021, vol. 27, no. 3, pp. 189–199. https://doi.org/10.4258/hir.2021.27.3.189

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License