<div>
	Attacker group detection method based on HTTP payload analysis</div>

Pavlov Artem V., Natalia V. Voloshina

2023 , VOLUME 23, NUMBER 3 ( March-April )

ISSN 2226-1494 (print), ISSN 2500-0373 (online)

Publications

Editor-in-Chief

Nikiforov
Vladimir O.
D.Sc., Prof.

Partners

doi: 10.17586/2226-1494-2023-23-3-500-505

Attacker group detection method based on HTTP payload analysis

A. V. Pavlov, N. V. Voloshina

Read the full article

Article in Russian

For citation:

Pavlov A.V., Voloshina N.V. Attacker group detection method based on HTTP payload analysis. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2023, vol. 23, no. 3, pp. 500–505 (in Russian). doi: 10.17586/2226-1494-2023-23-3-500-505

Abstract

Attacks on web applications are a frequent vector of attack on information resources by attackers of various skill levels. Such attacks can be investigated through analysis of HTTP requests made by the attackers. The possibility of identifying groups of attackers based on the analysis of the payload of HTTP requests marked by IDS as attack events has been studied. The identification of groups of attackers improves the work of security analysts investigating and responding to incidents, reduces the impact of alert fatigue in the analysis of security events, and also helps in identifying attack patterns and resources of intruders. Identification of groups of attackers within the framework of the proposed method is performed based on the sequence of stages. At the first stage, requests are split into tokens by a regular expression based on the features of the HTTP protocol and attacks that are often encountered and detected by intrusion detection systems. Then the tokens are weighted using the TF-IDF method, which allows to further give a greater contribution when comparing requests to the coincidence of rare words. At the next stage the main core of requests is separated based on their distance from the origin. Thus, requests not containing rare words, the coincidence of which allows us to talk about the connectedness of events, are separated. Manhattan distance is used to determine the distance. Finally, clustering is carried out using the DBSCAN method. It is shown that HTTP request payload data can be used to identify groups of attackers. An efficient method of tokenization, weighting and clustering of the considered data is proposed. The use of the DBSCAN method for clustering within the framework of the method is proposed. The homogeneity, completeness and V-measure of clustering obtained by various methods on the CPTC-2018 dataset were evaluated. The proposed method allows obtaining a clustering of events with high homogeneity and sufficient completeness. It is proposed to combine the resulting clustering with clusters obtained by other methods with high clustering homogeneity to obtain a high completeness metric and V-measure while maintaining high homogeneity. The proposed method can be used in the work of security analysts in SOC, CERT and CSIRT, both in defending against intrusions including APT and in collecting data on attackers’ techniques and tactics. The method makes it possible to identify patterns of traces of tools used by attackers, which allows attribution of attacks.

Keywords: attacker groups, complex attacks, intrusion detection, alert correlation

References

Hassan W., Guo S., Li D., Chen Z., Jee K., Li Z., Bates A. NoDoze: Combatting threat alert fatigue with automated provenance triage. Proc. of the 2019 Network and Distributed System Security Symposium, 2019. https://doi.org/10.14722/ndss.2019.23349
Pavlov A., Voloshina N. Analysis of IDS alert correlation techniques for attacker group recognition in distributed systems. Lecture Notes in Computer Science, 2020, vol. 12525, pp. 32–42. https://doi.org/10.1007/978-3-030-65726-0_4
Kotenko I., Gaifulina D., Zelichenok I. Systematic literature review of security event correlation methods. IEEE Access, 2022, vol. 10, pp. 43387–43420. https://doi.org/10.1109/access.2022.3168976
Mirheidari S.A., Arshad S., Jalili R. Alert correlation algorithms: A survey and taxonomy. Lecture Notes in Computer Science, 2013, vol. 8300, pp. 183–197. https://doi.org/10.1007/978-3-319-03584-0_14
Navarro J., Deruyver A., Parrend P. A systematic survey on multi-step attack detection.Computers & Security, 2018, vol. 76,pp. 214–249. https://doi.org/10.1016/j.cose.2018.03.001
Zhan J., Liao X., Bao Y., Gan L., Tan Z., Zhang M., He R., Lu J. An effective feature representation of web log data by leveraging byte pair encoding and TF-IDF. Proc. of the ACM Turing Celebration Conference - China (ACM TURC '19), 2019, pp. 62. https://doi.org/10.1145/3321408.3321568
Qi B., Shi Z., Wang Y., Wang J., Wang Q., Jiang J. BotTokenizer: Exploring network tokens of HTTP-based botnet using malicious network traces. Lecture Notes in Computer Science, 2018,vol. 10726, pp. 383–403. https://doi.org/10.1007/978-3-319-75160-3_23
Chen R.-C., Chen S.-P. Intrusion detection using a hybrid support vector machine based on entropy and TF-IDF. International Journal of Innovative Computing, Information & Control (IJICIC), 2008, vol. 4, no. 2, pp. 413–424.
Pavlov A.V. Analysis of network interaction of modern exploits. Information Technologies, 2022, vol. 28, no. 2, pp. 75–80. (in Russian). https://doi.org/10.17587/it.28.75-80
Salton G., Buckley C. Term-weighting approaches in automatic text retrieval.Information Processing & Management, 1988, vol. 24, no. 5, pp. 513–523. https://doi.org/10.1016/0306-4573(88)90021-0
Aggarwal C., Hinneburg A., Keim D.On the surprising behavior of distance metrics in high dimensional space. Lecture Notes in Computer Science, 2001, vol. 1973, pp. 420–434. https://doi.org/10.1007/3-540-44503-x_27
Muniah N., Pelletier J., Su S.-H., Yang S.J., Meneely A. A cybersecurity dataset derived from the national collegiate penetration testing competition. Proc. of the HICSS Symposium on Cybersecurity Big Data Analytics, 2019.
Rosenberg A., Hirschberg J. V-Measure: A conditional entropy-based external cluster evaluation measure. Proc. of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), 2007, pp. 410–420.
Shi J., Malik J. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, vol. 22, no. 8, pp. 888–905. https://doi.org/10.1109/34.868688
Ester M., Kriegel H.-P., Sander J., Xu X. A density-based algorithm for discovering clusters in large spatial databases with noise. Proc. of the 2^nd International Conference on Knowledge Discovery and Data Mining (KDD), 1996, pp. 226–231.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License