doi: 10.17586/2226-1494-2022-22-2-262-268


Lightweight approach for malicious domain detection using machine learning

G. Pradeepa, R. Devi


Read the full article  ';
Article in English

For citation:
Pradeepa G., Devi R. Lightweight approach for malicious domain detection using machine learning. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2022, vol. 22, no. 2, pp. 262–268. doi: 10.17586/2226-1494-2022-22-2-262-268


Abstract

The web-based attacks use the vulnerabilities of the end users and their system and perform malicious activities such as stealing sensitive information, injecting malwares, redirecting to malicious sites without their knowledge. Malicious website links are spread through social media posts, emails and messages. The victim can be an individual or an organization and it creates huge money loss every year. Recent Internet Security report states that 83 % of systems in the internet are infected by the malware during the last 12 months due to the users who do not aware of the malicious URL (Uniform Resource Locators) and its impacts. There are some methods to detect and prevent the access malicious domain name in the internet. Blacklist-based approaches, heuristic-based methods, and machine/deep learning-based methods are the three categories. This study provides a machine learning-based lightweight solution to classify malicious domain names. Most of the existing research work is focused on increasing the number of features for better classification accuracy. But the proposed approach uses fewer number of features which include lexical, content based, bag of words, popularity features for malicious domain classification. Result of the experiment shows that the proposed approach performs better than the existing one.


Keywords: machine learning, lexical features, malicious domain, support vector, random forest, feature selection, cyber security

References
  1. Warburton D. 2020 Phishing and Fraud Report. Available at: https://www.f5.com/labs/articles/threat-intelligence/2020-phishing-and-fraud-report(accessed: 11.11.2020).
  2. Saleem Raja A., Vinodini R., Kavitha A. Lexical features based malicious URL detection using machine learning techniques. Materials Today: Proceedings, 2021, vol. 47, part 1, pp. 163–166. https://doi.org/10.1016/j.matpr.2021.04.041
  3. Pradeepa G., Devi R. Review of malicious URL detection using machine learning. Advances in Intelligent Systems and Computing, 2021, vol. 1397, pp. 97–105. https://doi.org/10.1007/978-981-16-5301-8_7
  4. Joshi A., Lloyd L., Westin P., Seethapathy S. Using lexical features for malicious URL detection - a machine learning approach. ArXiv, 2019, arXiv:1910.06277.
  5. Tupsamudre H., Singh A.K., Lodha S. Everything is in the name – a URL based approach for phishing detection. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2019, vol. 11527, pp. 231–248. https://doi.org/10.1007/978-3-030-20951-3_21
  6. Sahoo D., Liu C., Hoi S.C.H. Malicious URL Detection using Machine Learning: A Survey. arXiv, 2017, arXiv:1701.07179.
  7. Ma J., Saul L.K., Savage S., Voelker G.M. Identifying suspicious URLs: an application of large-scale online learning. Proc. of the 26th International Conference on Machine Learning (ICML), 2009, pp. 681–688. https://doi.org/10.1145/1553374.1553462
  8. Kevin McGrath D., Gupta M. Behind phishing: An examination of phisher modi operandi. Proc. of the 1st USENIX Workshop on Large-Scale Exploits and Emergent Threats: Botnets, Spyware, Worms, and More (LEET), 2008.
  9. Hou Y.-T., Chang Y., Chen T., Laih C.-S., Chen C.-M. Malicious web content detection by machine learning. Expert Systems with Applications, 2010, vol. 37, no. 1, pp. 55–60. https://doi.org/10.1016/j.eswa.2009.05.023
  10. Fu A.Y., Liu W., Deng X. Detecting phishing web pages with visual similarity assessment based on earth mover’s distance (EMD). IEEE Transactions on Dependable and Secure Computing, 2006, vol. 3, no. 4, pp. 301–311. https://doi.org/10.1109/TDSC.2006.50
  11. Sahingoz O.K., Buber E., Demir O., Diri B. Machine learning based phishing detection from URLs. Expert Systems with Applications, 2019, vol. 117, pp. 345–357. https://doi.org/10.1016/j.eswa.2018.09.029
  12. Patgiri R., Katari H., Kumar R., Sharma D. Empirical study on malicious URL detection using machine learning. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2019, vol. 11319, pp. 380–388. https://doi.org/10.1007/978-3-030-05366-6_31
  13. Xuan C.D., Nguyen H.D., Tisenko V.N. Malicious URL detection based on machine learning. International Journal of Advanced Computer Science and Applications (IJACSA), 2020, vol. 11, no. 1. http://doi.org/10.14569/IJACSA.2020.0110119
  14. Catak F.O., Sahinbas K., Dörtkardeş V. Malicious URL detection using machine learning. Artificial Intelligence Paradigms for Smart Cyber-Physical Systems, 2021, pp. 21. https://doi.org/10.4018/978-1-7998-5101-1.ch008
  15. Butnaru A., Mylonas A., Pitropakis N. Towards lightweight URL-based phishing detection. Future Internet, 2021, vol. 13, no. 6, pp. 154. https://doi.org/10.3390/fi13060154
  16. Browniee J. How to choose a feature selection method for machine learning. Available at: https://machinelearningmastery.com/feature-selection-with-real-and-categorical-data/(accessed: 20.08.2020).


Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License
Copyright 2001-2025 ©
Scientific and Technical Journal
of Information Technologies, Mechanics and Optics.
All rights reserved.

Яндекс.Метрика