An approach to contextual example mining for DGA domain identification using large language models

Menisov Artem B., Vladimir M. Morgunov, Pavel V. Timashov

2026 , VOLUME 26, NUMBER 2 ( march-april )

ISSN 2226-1494 (print), ISSN 2500-0373 (online)

Publications

Editor-in-Chief

Nikiforov
Vladimir O.
D.Sc., Prof.

Partners

doi: 10.17586/2226-1494-2026-26-2-367-377

An approach to contextual example mining for DGA domain identification using large language models

A. B. Menisov, V. M. Morgunov, P. V. Timashov

Read the full article

Article in Russian

For citation:

Menisov A.B., Morgunov V.M., Timashov P.V. An approach to contextual example mining for DGA domain identification using large language models. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2026, vol. 26, no. 2, pp. 367–377 (in Russian). doi: 10.17586/2226-1494-2026-26-2-367-377

Abstract

The article addresses the problem of detecting domains generated by Domain Generation Algorithms (DGA) which are widely used by attackers to build robust botnet control channels and covert communication. Traditional methods are based on manual feature engineering or specialized neural network architectures that reduce their robustness to evolving DGA families. The scientific novelty of the proposed approach lies in the use of Large Language Models (LLM) by leveraging their contextual adaptation mechanism to identify hidden patterns in domain names and classify them. The developed approach is based on the use of LLMs which receives examples of legitimate and generated domains within the context. To improve the efficiency, example selection strategies (TopK, VoteK), various metrics of data homogeneity and variability are used. Additionally, the influence of the domain name length and entropy on the stability of the approach is analyzed. The experimental part is performed on a dataset including 68 DGA families and a subset of legitimate Tranco domains. The training set included 54 families, and testing took place on all 68 families, including previously unseen 14 families. Results showed the efficiency of the approach: precision = 0.93, recall = 0.95 and F1-measure = 0.94. The ability of LLM to generalize rules to new DGA families is confirmed. Compared with existing methods, the proposed approach does not require additional retraining and provides flexibility due to contextual adaptation. It demonstrated resistance to noise and the capability to detect new DGA families, which makes its application promising in the field of cybersecurity. At the same time, the sensitivity of the model to the length of domain names and the need for context balancing were revealed. Promising areas of development are the integration of additional features (DNS metadata, query time series) and methods for adaptation to stream processing.

Keywords: information security, DNS tunneling, domain generation algorithms, large language models, contextual adaptation

References

1. Hassaoui M., Hanini M., Kafhali S.E. Data science in cybersecurity to detect malware-based domain generation algorithm: improvement, challenges, and prospects. Journal of Computational and Cognitive Engineering, 2024, vol. 3, no. 3, pp. 213–225. https://doi.org/10.47852/bonviewJCCE42022875

2. Albluwi A., Albalawi U., Elfaki A.O. A DNS threat awareness practical framework using knowledge graph. Journal of Information Science and Engineering, 2025, vol. 41, pp. 1239–1261.

3. Arora A., Shantanu. A review on application of GANs in cybersecurity domain. IETE Technical Review, 2022, vol. 39, no. 2,pp. 433–441. https://doi.org/10.1080/02564602.2020.1854058

4. Patsakis C., Casino F. Exploiting statistical and structural features for the detection of Domain Generation Algorithms. Journal of Information Security and Applications, 2021, vol. 58, pp. 102725. https://doi.org/10.1016/j.jisa.2020.102725

5. Kolte S., Jare A., Babar V., Kadam S., Tekade P., Salunke D. A machine learning-based framework for real-time DNS threat detection and mitigation using ensemble models and advanced security mechanisms. Proc. of the International Conference on Electronics, AI and Computing (EAIC), 2025, pp. 1–6. https://doi.org/10.1109/EAIC66483.2025.11101638

6. Pelayo-Benedet T., Rodríguez R.J., Gañán C.H. Poster: Exploring the zero-shot potential of large language models for detecting algorithmically generated domains. Lecture Notes in Computer Science, 2025, vol. 15748, pp. 86–92. https://doi.org/10.1007/978-3-031-97623-0_5

7. Alorainy W.S. Echoes from the void: detecting DNS tunneling with blackhole features in encrypted scenarios with high accuracy. IEEE Access, 2025, vol. 13, pp. 138551–138567. https://doi.org/10.1109/ACCESS.2025.3595455

8. Sharma N., Swarnkar M. DLAZE: Detecting DNS tunnels using lightweight and accurate method for zero-day exploits. IEEE Transactions on Network and Service Management, 2025, vol. 22, no. 3, pp. 2343–2353. https://doi.org/10.1109/TNSM.2025.3541234

9. Fu Y., Yu L., Hambolu O., Ozcelik I., Husain B., Sun J., et al. Stealthy domain generation algorithms. IEEE Transactions on Information Forensics and Security, 2017, vol. 12, no. 6,pp. 1430–1443. https://doi.org/10.1109/TIFS.2017.2668361

10. Cao Y., Li S., Liu Y., Yan Z., Dai Y., Yu P., Sun L. A survey of AI-Generated Content (AIGC). ACM Computing Surveys, 2025, vol. 57, no. 5, pp. 1–38. https://doi.org/10.1145/3704262

11. De Bernardi G., Gaggero G.B., Patrone F., Zappatore S., Marchese M., Mongell M. Rule-based eXplainable autoencoder for DNS tunneling detection. Computers, 2025, vol.14, no. 9,pp. 375. https://doi.org/10.3390/computers14090375

12. Bykov N., Chernyshov Y. Detecting DNS tunnels using machine learning. Proc. of the IEEE Ural-Siberian Conference on Biomedical Engineering, Radioelectronics and Information Technology (USBEREIT), 2024, pp. 92–94. https://doi.org/10.1109/usbereit61901.2024.10584043

13. Namgung J., Son S., Moon Y.-S. Efficient deep learning models for DGA domain detection. Security and Communication Networks, 2021, vol. 2021, no. 1, pp. 8887881. https://doi.org/10.1155/2021/8887881

14. Zhou S., Lin L., Yuan J., Wang F., Ling Z., Cui J. CNN-based DGA detection with high coverage. Proc. of the IEEE International Conference on Intelligence and Security Informatics (ISI), 2019, pp. 62–67. https://doi.org/10.1109/isi.2019.8823200

15. Vu X.H., Hoang X.D., Chu T.H.H. A novel model based on ensemble learning for detecting DGA botnets. Proc. of the 14^th International Conference on Knowledge and Systems Engineering (KSE), 2022, pp. 1–6. https://doi.org/10.1109/kse56063.2022.9953792

16. Tapsoba A.R., Ouédraogo T.F., Zongo W.B.S. Analysis of plaintext features in DoH traffic for DGA domains detection. Lecture Notes in Networks and Systems, 2024, vol. 932, pp. 127–138. https://doi.org/10.1007/978-3-031-54235-0_12

17. Harishkumar S., Bhuvaneswaran R.S. Enhanced DGA detection in Botnet traffic: leveraging N-Gram, topic modeling, and attention BiLSTM. Peer-to-Peer Networking and Applications, 2025, vol.18, no. 1, pp. 55. https://doi.org/10.1007/s12083-024-01822-8

18. Tian Y., Li Z. Dom-Bert: Detecting malicious domains with pre-training model. Lecture Notes in Computer Science, 2024, vol. 14537, pp. 133–158. https://doi.org/10.1007/978-3-031-56249-5_6

19. Zhang J., Bu H., Wen H., Liu Y., Fei H., Xi R., et al. When LLMs meet cybersecurity: a systematic literature review. Cybersecurity, 2025, vol. 8, no. 1, pp. 55. https://doi.org/10.1186/s42400-025-00361-w

20. Arslan M., Ghanem H., Munawar S., Cruz C. A survey on RAG with LLMs. Procedia Computer Science, 2024, vol. 246, pp. 3781–3790. https://doi.org/10.1016/j.procs.2024.09.178

21. Wu X.-K., Chen M., Li W., Wang R., Lu L., Liu J., et al. LLM Fine-tuning: concepts, opportunities, and challenges. Big Data and Cognitive Computing, 2025, vol. 9, no. 4, pp. 87. https://doi.org/10.3390/bdcc9040087

22. Highmore C. In-context learning in large language models: a comprehensive survey. Preprints.org, 2024, 11 p. https://doi.org/10.20944/preprints202407.0926.v1

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License