Menu
Publications
2026
2025
2024
2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
Editor-in-Chief
Nikiforov
Vladimir O.
D.Sc., Prof.
Partners
doi: 10.17586/2226-1494-2026-26-2-367-377
An approach to contextual example mining for DGA domain identification using large language models
Read the full article
Article in Russian
For citation:
Abstract
For citation:
Menisov A.B., Morgunov V.M., Timashov P.V. An approach to contextual example mining for DGA domain identification using large language models. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2026, vol. 26, no. 2, pp. 367–377 (in Russian). doi: 10.17586/2226-1494-2026-26-2-367-377
Abstract
The article addresses the problem of detecting domains generated by Domain Generation Algorithms (DGA) which are widely used by attackers to build robust botnet control channels and covert communication. Traditional methods are based on manual feature engineering or specialized neural network architectures that reduce their robustness to evolving DGA families. The scientific novelty of the proposed approach lies in the use of Large Language Models (LLM) by leveraging their contextual adaptation mechanism to identify hidden patterns in domain names and classify them. The developed approach is based on the use of LLMs which receives examples of legitimate and generated domains within the context. To improve the efficiency, example selection strategies (TopK, VoteK), various metrics of data homogeneity and variability are used. Additionally, the influence of the domain name length and entropy on the stability of the approach is analyzed. The experimental part is performed on a dataset including 68 DGA families and a subset of legitimate Tranco domains. The training set included 54 families, and testing took place on all 68 families, including previously unseen 14 families. Results showed the efficiency of the approach: precision = 0.93, recall = 0.95 and F1-measure = 0.94. The ability of LLM to generalize rules to new DGA families is confirmed. Compared with existing methods, the proposed approach does not require additional retraining and provides flexibility due to contextual adaptation. It demonstrated resistance to noise and the capability to detect new DGA families, which makes its application promising in the field of cybersecurity. At the same time, the sensitivity of the model to the length of domain names and the need for context balancing were revealed. Promising areas of development are the integration of additional features (DNS metadata, query time series) and methods for adaptation to stream processing.
Keywords: information security, DNS tunneling, domain generation algorithms, large language models, contextual adaptation
References
References
1. Hassaoui M., Hanini M., Kafhali S.E. Data science in cybersecurity to detect malware-based domain generation algorithm: improvement, challenges, and prospects. Journal of Computational and Cognitive Engineering, 2024, vol. 3, no. 3, pp. 213–225. https://doi.org/10.47852/bonviewJCCE42022875
2. Albluwi A., Albalawi U., Elfaki A.O. A DNS threat awareness practical framework using knowledge graph. Journal of Information Science and Engineering, 2025, vol. 41, pp. 1239–1261.
3. Arora A., Shantanu. A review on application of GANs in cybersecurity domain. IETE Technical Review, 2022, vol. 39, no. 2,pp. 433–441. https://doi.org/10.1080/02564602.2020.1854058
4. Patsakis C., Casino F. Exploiting statistical and structural features for the detection of Domain Generation Algorithms. Journal of Information Security and Applications, 2021, vol. 58, pp. 102725. https://doi.org/10.1016/j.jisa.2020.102725
5. Kolte S., Jare A., Babar V., Kadam S., Tekade P., Salunke D. A machine learning-based framework for real-time DNS threat detection and mitigation using ensemble models and advanced security mechanisms. Proc. of the International Conference on Electronics, AI and Computing (EAIC), 2025, pp. 1–6. https://doi.org/10.1109/EAIC66483.2025.11101638
6. Pelayo-Benedet T., Rodríguez R.J., Gañán C.H. Poster: Exploring the zero-shot potential of large language models for detecting algorithmically generated domains. Lecture Notes in Computer Science, 2025, vol. 15748, pp. 86–92. https://doi.org/10.1007/978-3-031-97623-0_5
7. Alorainy W.S. Echoes from the void: detecting DNS tunneling with blackhole features in encrypted scenarios with high accuracy. IEEE Access, 2025, vol. 13, pp. 138551–138567. https://doi.org/10.1109/ACCESS.2025.3595455
8. Sharma N., Swarnkar M. DLAZE: Detecting DNS tunnels using lightweight and accurate method for zero-day exploits. IEEE Transactions on Network and Service Management, 2025, vol. 22, no. 3, pp. 2343–2353. https://doi.org/10.1109/TNSM.2025.3541234
9. Fu Y., Yu L., Hambolu O., Ozcelik I., Husain B., Sun J., et al. Stealthy domain generation algorithms. IEEE Transactions on Information Forensics and Security, 2017, vol. 12, no. 6,pp. 1430–1443. https://doi.org/10.1109/TIFS.2017.2668361
10. Cao Y., Li S., Liu Y., Yan Z., Dai Y., Yu P., Sun L. A survey of AI-Generated Content (AIGC). ACM Computing Surveys, 2025, vol. 57, no. 5, pp. 1–38. https://doi.org/10.1145/3704262
11. De Bernardi G., Gaggero G.B., Patrone F., Zappatore S., Marchese M., Mongell M. Rule-based eXplainable autoencoder for DNS tunneling detection. Computers, 2025, vol.14, no. 9,pp. 375. https://doi.org/10.3390/computers14090375
12. Bykov N., Chernyshov Y. Detecting DNS tunnels using machine learning. Proc. of the IEEE Ural-Siberian Conference on Biomedical Engineering, Radioelectronics and Information Technology (USBEREIT), 2024, pp. 92–94. https://doi.org/10.1109/usbereit61901.2024.10584043
13. Namgung J., Son S., Moon Y.-S. Efficient deep learning models for DGA domain detection. Security and Communication Networks, 2021, vol. 2021, no. 1, pp. 8887881. https://doi.org/10.1155/2021/8887881
14. Zhou S., Lin L., Yuan J., Wang F., Ling Z., Cui J. CNN-based DGA detection with high coverage. Proc. of the IEEE International Conference on Intelligence and Security Informatics (ISI), 2019, pp. 62–67. https://doi.org/10.1109/isi.2019.8823200
15. Vu X.H., Hoang X.D., Chu T.H.H. A novel model based on ensemble learning for detecting DGA botnets. Proc. of the 14th International Conference on Knowledge and Systems Engineering (KSE), 2022, pp. 1–6. https://doi.org/10.1109/kse56063.2022.9953792
16. Tapsoba A.R., Ouédraogo T.F., Zongo W.B.S. Analysis of plaintext features in DoH traffic for DGA domains detection. Lecture Notes in Networks and Systems, 2024, vol. 932, pp. 127–138. https://doi.org/10.1007/978-3-031-54235-0_12
17. Harishkumar S., Bhuvaneswaran R.S. Enhanced DGA detection in Botnet traffic: leveraging N-Gram, topic modeling, and attention BiLSTM. Peer-to-Peer Networking and Applications, 2025, vol.18, no. 1, pp. 55. https://doi.org/10.1007/s12083-024-01822-8
18. Tian Y., Li Z. Dom-Bert: Detecting malicious domains with pre-training model. Lecture Notes in Computer Science, 2024, vol. 14537, pp. 133–158. https://doi.org/10.1007/978-3-031-56249-5_6
19. Zhang J., Bu H., Wen H., Liu Y., Fei H., Xi R., et al. When LLMs meet cybersecurity: a systematic literature review. Cybersecurity, 2025, vol. 8, no. 1, pp. 55. https://doi.org/10.1186/s42400-025-00361-w
20. Arslan M., Ghanem H., Munawar S., Cruz C. A survey on RAG with LLMs. Procedia Computer Science, 2024, vol. 246, pp. 3781–3790. https://doi.org/10.1016/j.procs.2024.09.178
21. Wu X.-K., Chen M., Li W., Wang R., Lu L., Liu J., et al. LLM Fine-tuning: concepts, opportunities, and challenges. Big Data and Cognitive Computing, 2025, vol. 9, no. 4, pp. 87. https://doi.org/10.3390/bdcc9040087
22. Highmore C. In-context learning in large language models: a comprehensive survey. Preprints.org, 2024, 11 p. https://doi.org/10.20944/preprints202407.0926.v1

