Social media user identity linkage by graphic content comparison

Anastasia A. Korepanova , Maxim V. Abramov, Tulupyev Aleksander L

2021 , VOLUME 21, NUMBER 6 ( november-december )

ISSN 2226-1494 (print), ISSN 2500-0373 (online)

Publications

Editor-in-Chief

Nikiforov
Vladimir O.
D.Sc., Prof.

Partners

doi: 10.17586/2226-1494-2021-21-6-942-950

Social media user identity linkage by graphic content comparison

A. A. Korepanova, M. V. Abramov, A. L. Tulupyev

Read the full article

Article in Russian

For citation:

Korepanova A.A., Abramov M.V., Tulupyev A.L. Social media user identity linkage by graphic content comparison. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2021, vol. 21, no. 6, pp. 942–950 (in Russian). doi: 10.17586/2226-1494-2021-21-6-942-950

Abstract

The article proposes a new approach to comparing accounts of the social media “VKontakte” and “Instagram” to determine those accounts which belong to the same user. The approach is based on the comparison of graphic content; the novelty of the approach consists in combining several methods for matching graphic content, also for the first time a method is proposed for matching accounts of the mentioned social media. The proposed method combines three methods of matching graphic content: by extracting the faces of the account users from the photos in the account and matching them, by matching all faces in both accounts, and by pairwise comparison of images to determine the same images in both accounts using the perceptual pHash method. The described method was tested on a dataset of more than 8,000 pairs of accounts. According to the results of the experiment, the value of the F1-score metric reached 0.87. The practical significance lies in automating the comparison of user accounts in various social networks by implementing of the developed algorithm in the prototype of the software package. A further direction for research lies in expanding the set of data and attributes of profiles considered for comparison. The results can be introduced into a software package for the analysis of the security of a user of information systems against social engineering attacks. It seems promising to combine the obtained findings with account matching methods based on the structural similarity of social graphs.

Keywords: social media, user identity linkage, image processing, machine learning, social engineering attacks

Acknowledgements. This work was carried out within the framework of the project under the state assignment of SPC RAS SPIIRAS No. 0073-2019-0003 (approach formation); supported by Saint Petersburg State University, project No. 73555239 (implementation of the approach and its approbation); with the financial support of the RFBR, project No. 20-07-00839 (approbation of the results in the prototype of the software package).

References

Camacho D., Panizo-LLedot Á., Bello-Orgaz G., Gonzalez-Pardo A., Cambria E. The four dimensions of social network analysis: An overview of research methods, applications, and software tools // Information Fusion. 2020. V. 63. P. 88–120. https://doi.org/10.1016/j.inffus.2020.05.009
Yamane D., Yamane P., Ivory S.L. Targeted advertising: Documenting the emergence of Gun Culture 2.0 in Guns magazine, 1955–2019 // Palgrave Communications. 2020. V. 6. N 1. P. 61. https://doi.org/10.1057/s41599-020-0437-0
Hinds J., Williams E.J., Joinson A.N. “It wouldn't happen to me”: Privacy concerns and perspectives following the Cambridge Analytica scandal // International Journal of Human Computer Studies. 2020. V. 143. P. 102498. https://doi.org/10.1016/j.ijhcs.2020.102498
Yu X., Yang Q., Wang R., Fang R., Deng M. Data cleaning for personal credit scoring by utilizing social media data: An empirical study // IEEE Intelligent Systems. 2020. V. 35. N 2. P. 7–15. https://doi.org/10.1109/MIS.2020.2972214
Óskarsdóttir M., Bravo C., Sarraute C., Vanthienen J., Baesens B. The value of big data for credit scoring: Enhancing financial inclusion using mobile phone data and social network analytics // Applied Soft Computing Journal. 2019. V. 74. P. 26–39. https://doi.org/10.1016/j.asoc.2018.10.004
Guo G., Zhu F., Chen E., Liu Q., Wu L., Guan C. From footprint to evidence: An exploratory study of mining social data for credit scoring // ACM Transactions on the Web. 2016. V. 10. N 4. P. 1–38. https://doi.org/10.1145/2996465
Азаров А.А., Тулупьева Т.В., Суворова А.В., Тулупьев А.Л., Абрамов М.В., Юсупов Р.М.Социоинженерные атаки. Проблемы анализа. СПб.: Наука, 2016. 349 с.
Абрамов М.В., Тулупьева Т.В., Тулупьев А.Л. Социоинженерные атаки: социальные сети и оценки защищенности пользователей. СПб.: ГУАП, 2018. 266 с.
CinelliM., QuattrociocchiW., GaleazziA., ValensiseC.M., Brugnoli E., Schmidt A.L., Zola P., Zollo F., Scala A. The COVID-19 social media infodemic // Scientific Reports. 2020. V. 10. P. 16598. https://doi.org/10.1038/s41598-020-73510-5
Khlobystova A.O., Abramov M.V., Tulupyev A.L. Soft estimates for social engineering attack propagation probabilities depending on interaction rates among instagram users // Studies in Computational Intelligence. 2020. V. 868. P. 272–277.https://doi.org/10.1007/978-3-030-32258-8_32
Oliseenko V., Korepanova A. How old users are? Community analysis // CEUR Workshop Proceedings. 2020. V. 2782. P. 246–251.
Хлобыстова А.О., Абрамов М.В., Тулупьев А.Л., Золотин А.А. Поиск кратчайшей траектории социоинженерной атаки между парой пользователей в графе с вероятностями переходов // Информационно-управляющие системы. 2018. № 6. С. 74–81. https://doi.org/10.31799/1684-8853-2018-6-74-81
Корепанова А.А., Абрамов М.В., Тулупьева Т.В. Идентификация аккаунтов пользователей в социальных сетях "ВКонтакте" и "Одноклассники" // Семнадцатая Национальная конференция по искусственному интеллекту с международным участием. КИИ-2019: сборник научных трудов. в 2-х томах. Т. 2. 2019. С. 153–163.
Корепанова А.А., Тулупьева Т.В. Идентификация аккаунтов пользователя в различных социальных сетях по социальному окружению// Информационная безопасность регионов России (ИБРР-2019): материалы конференции. СПб., 2019. С. 442–443.
Liu J., Zhang F., Song X., Song Y.-I., Lin C.-Y., Hon H.-W. What’s in a name? An unsupervised approach to link users across communities // Proc. of the 6^th ACM International Conference on Web Search and Data Mining (WSDM). 2013. P. 495–504. https://doi.org/10.1145/2433396.2433457
Zafarani R., Liu H. Connecting users across social media sites: a behavioral-modeling approach // Proc. of the 19^th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD). 2013. P. 41–49.https://doi.org/10.1145/2487575.2487648
Zhang H., Kan M., Liu Y., Ma S. Online social network profile linkage // Lecture Notes in Computer Science. 2014. V. 8870. P. 197–208.https://doi.org/10.1007/978-3-319-12844-3_17
Mu X., Zhu F., Lim E., Xiao J., Wang J., Zhou Z. User identity linkage by latent user space modelling // Proceedings of the 22^nd ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD). 2016. P. 1775–1784.https://doi.org/10.1145/2939672.2939849
Nie Y., Jia Y., Li S., Zhu X., Li A., Zhou B. Identifying users across social networks based on dynamic core interests // Neurocomputing. 2016. V. 210. P. 107–115.https://doi.org/10.1016/j.neucom.2015.10.147
Riederer C.J., Kim Y., Chaintreau A., Korula N., Lattanzi S. Linking users across domains with location data: Theory and validation // Proc. of the 25^th International Conference on World Wide Web (WWW). 2016. P. 707–719.ttps://doi.org/10.1145/2872427.2883002
Chen X., Song X., Cui S., Gan T., Cheng Z., Nie L. User identity linkage across social media via attentive time-aware user modeling // IEEE Transactions on Multimedia. 2020. in press. https://doi.org/10.1109/TMM.2020.3034540
Nurgaliev I., Qu Q., Bamakan S.M.H., Muzammal M. Matching user identities across social networks with limited profile data // Frontiers of Computer Science. 2020. V. 14. N 6. P. 146809.https://doi.org/10.1007/s11704-019-8235-9
Li Y., Su Z., Yang J., Gao C. Exploiting similarities of user friendship networks across social networks for user identification // Information Sciences. 2020. V. 506. P. 78–98.https://doi.org/10.1016/j.ins.2019.08.022
Ma T., Guo L., Wang X., Qian Y., Tian Y., Al-Nabhan N. Friend closeness based user matching cross social networks // Mathematical Biosciences and Engineering. 2021. V. 18. N 4. P. 4264–4292. https://doi.org/10.3934/mbe.2021214
Dalal N., Triggs B. Histograms of oriented gradients for human detection // Proc. of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05). 2005. V. 1. P. 886–893. https://doi.org/10.1109/CVPR.2005.177
Schubert E., Sander J., Ester M., Kriegel H.-P., Xu X. DBSCAN revisited, revisited: Why and how you should (still) use DBSCAN // ACM Transactions on Database Systems. 2017. V. 42. N 3. P. 19.https://doi.org/10.1145/3068335
Rymarczyk T., Kozłowski E., Kłosowski G., Niderla K. Logistic regression for machine learning in process tomography // Sensors. 2019. V. 19. N 15. P. 3400. https://doi.org/10.3390/s19153400
Олисеенко В.Д., Абрамов М.В., Тулупьев А.Л. Идентификация аккаунтов пользователей при помощи сравнения изображений: подход на основе phash// Научно-технический вестник информационных технологий, механики и оптики. 2021. Т. 21. № 4. С. 562–570. https://doi.org/10.17586/2226-1494-2021-21-4-562-570
Brigham E.O. The Fast Fourier Transform. New York, USA: Prentice-Hall, 2002.
MacKay D.J.C.Information Theory, Inference, and Learning Algorithms. Cambridge: Cambridge University Press, 2003. 628 p.
Воронцов К.В. Комбинаторный подход к оценке качества обучаемых алгоритмов // Математические вопросы кибернетики.2004. T. 13. С. 5–36.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License