The task of anonymous web users identification becomes more and more important research task. The number of users is increased dramatically and usage of the Internet for criminal purposes (such as anonymous threats and extremist statements) becomes more frequent. Existing approaches and algorithms for identifying anonymous users are not enough efficient. In the context of this work, user identification means recognizing of an anonymous user on the Internet. Identification is performed by correlating the set of anonymous user features with stored in the database features collected previously. Feature set of the user consists of technical features (IP- address, OS version, etc.) and writing-style features of the user (for short texts in the Russian language). We compared the discriminating power of three feature sets (technical, writing-style and combined) and of three classification methods (Support Vector Machines, neural networks, logistic regression). Results of the experiment showed that the usage of combined feature set (writing-style and technical features) improves the identification accuracy of an anonymous user of the Internet.

Keywords: anonymous users’ identification, text attribution, authorship of text messages, author attribution, computational linguistics, information security

