Lightweight approach for malicious domain detection using machine learning

Ganesan Pradeepa, Radhakrishnan Devi

2022 , VOLUME 22, NUMBER 2 ( March-April )

ISSN 2226-1494 (print), ISSN 2500-0373 (online)

Publications

Editor-in-Chief

Nikiforov
Vladimir O.
D.Sc., Prof.

Partners

doi: 10.17586/2226-1494-2022-22-2-262-268

Lightweight approach for malicious domain detection using machine learning

G. Pradeepa, R. Devi

Read the full article

Article in English

For citation:

Pradeepa G., Devi R. Lightweight approach for malicious domain detection using machine learning. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2022, vol. 22, no. 2, pp. 262–268. doi: 10.17586/2226-1494-2022-22-2-262-268

Abstract

The web-based attacks use the vulnerabilities of the end users and their system and perform malicious activities such as stealing sensitive information, injecting malwares, redirecting to malicious sites without their knowledge. Malicious website links are spread through social media posts, emails and messages. The victim can be an individual or an organization and it creates huge money loss every year. Recent Internet Security report states that 83 % of systems in the internet are infected by the malware during the last 12 months due to the users who do not aware of the malicious URL (Uniform Resource Locators) and its impacts. There are some methods to detect and prevent the access malicious domain name in the internet. Blacklist-based approaches, heuristic-based methods, and machine/deep learning-based methods are the three categories. This study provides a machine learning-based lightweight solution to classify malicious domain names. Most of the existing research work is focused on increasing the number of features for better classification accuracy. But the proposed approach uses fewer number of features which include lexical, content based, bag of words, popularity features for malicious domain classification. Result of the experiment shows that the proposed approach performs better than the existing one.

Keywords: machine learning, lexical features, malicious domain, support vector, random forest, feature selection, cyber security

References

Warburton D. 2020 Phishing and Fraud Report. Available at: https://www.f5.com/labs/articles/threat-intelligence/2020-phishing-and-fraud-report(accessed: 11.11.2020).
Saleem Raja A., Vinodini R., Kavitha A. Lexical features based malicious URL detection using machine learning techniques. Materials Today: Proceedings, 2021, vol. 47, part 1, pp. 163–166. https://doi.org/10.1016/j.matpr.2021.04.041
Pradeepa G., Devi R. Review of malicious URL detection using machine learning. Advances in Intelligent Systems and Computing, 2021, vol. 1397, pp. 97–105. https://doi.org/10.1007/978-981-16-5301-8_7
Joshi A., Lloyd L., Westin P., Seethapathy S. Using lexical features for malicious URL detection - a machine learning approach. ArXiv, 2019, arXiv:1910.06277.
Tupsamudre H., Singh A.K., Lodha S. Everything is in the name – a URL based approach for phishing detection. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2019, vol. 11527, pp. 231–248. https://doi.org/10.1007/978-3-030-20951-3_21
Sahoo D., Liu C., Hoi S.C.H. Malicious URL Detection using Machine Learning: A Survey. arXiv, 2017, arXiv:1701.07179.
Ma J., Saul L.K., Savage S., Voelker G.M. Identifying suspicious URLs: an application of large-scale online learning. Proc. of the 26^th International Conference on Machine Learning (ICML), 2009, pp. 681–688. https://doi.org/10.1145/1553374.1553462
Kevin McGrath D., Gupta M. Behind phishing: An examination of phisher modi operandi. Proc. of the 1^st USENIX Workshop on Large-Scale Exploits and Emergent Threats: Botnets, Spyware, Worms, and More (LEET), 2008.
Hou Y.-T., Chang Y., Chen T., Laih C.-S., Chen C.-M. Malicious web content detection by machine learning. Expert Systems with Applications, 2010, vol. 37, no. 1, pp. 55–60. https://doi.org/10.1016/j.eswa.2009.05.023
Fu A.Y., Liu W., Deng X. Detecting phishing web pages with visual similarity assessment based on earth mover’s distance (EMD). IEEE Transactions on Dependable and Secure Computing, 2006, vol. 3, no. 4, pp. 301–311. https://doi.org/10.1109/TDSC.2006.50
Sahingoz O.K., Buber E., Demir O., Diri B. Machine learning based phishing detection from URLs. Expert Systems with Applications, 2019, vol. 117, pp. 345–357. https://doi.org/10.1016/j.eswa.2018.09.029
Patgiri R., Katari H., Kumar R., Sharma D. Empirical study on malicious URL detection using machine learning. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2019, vol. 11319, pp. 380–388. https://doi.org/10.1007/978-3-030-05366-6_31
Xuan C.D., Nguyen H.D., Tisenko V.N. Malicious URL detection based on machine learning. International Journal of Advanced Computer Science and Applications (IJACSA), 2020, vol. 11, no. 1. http://doi.org/10.14569/IJACSA.2020.0110119
Catak F.O., Sahinbas K., Dörtkardeş V. Malicious URL detection using machine learning. Artificial Intelligence Paradigms for Smart Cyber-Physical Systems, 2021, pp. 21. https://doi.org/10.4018/978-1-7998-5101-1.ch008
Butnaru A., Mylonas A., Pitropakis N. Towards lightweight URL-based phishing detection. Future Internet, 2021, vol. 13, no. 6, pp. 154. https://doi.org/10.3390/fi13060154
Browniee J. How to choose a feature selection method for machine learning. Available at: https://machinelearningmastery.com/feature-selection-with-real-and-categorical-data/(accessed: 20.08.2020).

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License