A. A. Vorobeva, A. . Gvozdev

Read the full article 


The task of anonymous web users identification becomes more and more important research task. The number of users is increased dramatically and usage of the Internet for criminal purposes (such as anonymous threats and extremist statements) becomes more frequent. Existing approaches and algorithms for identifying anonymous users are not enough efficient. In the context of this work, user identification means recognizing of an anonymous user on the Internet. Identification is performed by correlating the set of anonymous user features with stored in the database features collected previously. Feature set of the user consists of technical features (IP- address, OS version, etc.) and writing-style features of the user (for short texts in the Russian language). We compared the discriminating power of three feature sets (technical, writing-style and combined) and of three classification methods (Support Vector Machines, neural networks, logistic regression). Results of the experiment showed that the usage of combined feature set (writing-style and technical features) improves the identification accuracy of an anonymous user of the Internet.

Keywords: anonymous users’ identification, text attribution, authorship of text messages, author attribution, computational linguistics, information security

1. de Vel A., Anderson O., Corney M., Mohay G. Mining e-mail content for author identification forensics // SIGMOD Record. 2001. V. 30. N 4. P. 55–64.
2. Zheng R., Li J., Chen H., Huang Z. A Framework for Authorship Identification of Online Messages: Writing Style Features and Classification Techniques // Journal of the American Society of Information Science and Technology. 2006. V. 57. N 3. P. 378–393.
3. Iváncsy R., Juhász S. Analysis of Web User Identification Methods // International Journal of Computer Science. 2007. V. 2. N 3. P. 172–177.
4. Бессонова Е.Е., Зикратов И.А., Росков В.Ю. Анализ способов идентификации пользователя в сети Интернет // Научно-технический вестник информационных технологий, механики и оптики. 2012. № 6 (82). С. 128–129.
5. Романов А.C., Шелупанов А.А., Бондарчук С.С. Обобщенная методика идентификации автора неиз- вестного текста // Доклады Томского государственного университета систем управления и радиоэлек- троники. 2010. № 1 (21). Ч. 1. С. 108–112.
6. Гвоздев А.В., Лебедев И.С., Зикратов И.А. Вероятностная модель оценки информационного воздейст- вия // Научно-технический вестник информационных технологий, механики и оптики. 2012. № 2 (78). С. 99–103.
7. Abbasi A., Chen H. Applying Authorship Analysis to Extremist-group Web Forum Messages // IEEE Intelligent Systems. 2005. V. 20. N 5. P. 67–75.
8. Park T., Li J., Zhao H., Chau M. Analyzing writing styles of bloggers with different opinions // Proc. of the 19th Annual Workshop on Information Technologies and Systems (WITS 2009). Phoenix, Arizona, USA, 2009. P. 151–156.
9. Layton R., Watters P., Dazeley R. Authorship attribution for twitter in 140 characters or less // Second Cybercrime and Trustworthy Computing Workshop (CTC-2010). Ballart, VIC, Australia, 2010. P. 1–8.
10. Zheng R., Li J., Chen H., Huang Z. Authorship analysis in cybercrime investigation // Proc. of the 1st NSF/NIJ conference on Intelligence and security informatics (ISI'03). Berlin-Heidelberg: Springer-Verlag, 2003. P. 59–73.
11. Eckersley P. How Unique is Your Web Browser? // Lecture Notes in Computer Science. 2010. V. 6205. P. 1–18 [Электронный ресурс]. Режим доступа:, свободный. Яз. англ. (дата обращения 26.11.2013).
12. Stamatatos E. A Survey of Modern Authorship Attribution Methods // Journal of the American Society for Information Science and Technology. 2009. V. 60. N 3. P. 538–556.
13. Nawrot M. Automatic Author Attribution for Short Text Documents // Lecture Notes in Computer Science. 2011. V. 6562. P. 468–477.
14. Маннинг К.Д., Рагхаван П., Шютце Х. Введение в информационный поиск. М.: Вильямс, 2011. 528 с.
Copyright 2001-2017 ©
Scientific and Technical Journal
of Information Technologies, Mechanics and Optics.
All rights reserved.