Identification of user accounts by image comparison: the pHash-based approach

Valerii D. Oliseenko, Maxim V. Abramov, Tulupyev Aleksander L

2021 , VOLUME 21, NUMBER 4 ( July - August )

ISSN 2226-1494 (print), ISSN 2500-0373 (online)

Publications

Editor-in-Chief

Nikiforov
Vladimir O.
D.Sc., Prof.

Partners

doi: 10.17586/2226-1494-2021-21-4-562-570

Identification of user accounts by image comparison: the pHash-based approach

V. D. Oliseenko, M. V. Abramov, A. L. Tulupyev

Read the full article

Article in русский

For citation:

Oliseenko V.D., Abramov M.V., Tulupyev A.L. Identification of user accounts by image comparison: the pHash-based approach. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2021, vol. 21, no. 4, pp. 562–570 (in Russian). doi: 10.17586/2226-1494-2021-21-4-562-570

Abstract

The study presents a new approach to the identification of various online social networks’ users that allows for matching of accounts belonging to the same person. To achieve this goal, images extracted from digital footprints of users are used. The proposed new approach compares not only the main images of a user’s profile, but also all the elements of the graphic content published in a user’s account. The described approach requires a pairwise comparison of the images published by users in two accounts from different online social networks on the “all-to-all” principle to assess the probability that these accounts belong to the same user. The comparison of the labeled graphical content elements is performed using the well-known perceptual hash method called pHash. A computational experiment was conducted to evaluate the results obtained by using the proposed approach, the f1-score achieved 0.886 for three matched images. It is shown that the results of the pHash image comparison can be used for account identification as a standalone approach as well as to complement other identification approaches. The proposed algorithm can be used to supplement the existing methods for comparative analysis of accounts. Automation of the proposed approach provides a tool for aggregation and makes it possible to obtain more information about users, assessing the depth of their personality features. The results can be applied to forming a digital twin of the user for further description of his (or her) traits in the tasks of protection against social engineering attacks, targeted advertising, assessment of creditworthiness, and other studies related to online social networks and social sciences.

Keywords: online social networks, user identification, image processing, pHash, data science, social engineering attacks

Acknowledgements. This work was carried out within the framework of the project under the state assignment of SPC RAS SPIIRAS No. 0073-2019-0003 (approach formation); supported by Saint Petersburg State University, project No. 73555239 (implementation of the approach and its approbation); with the financial support of the RFBR, project No. 20-07-00839 (approbation of the results in the prototype of the software package).

References

Mineraud J., Mazhelis O., Su X., Tarkoma S. A gap analysis of Internet-of-Things platforms. ComputerCommunications, 2016, vol. 89-90, pp. 5–16. https://doi.org/10.1016/j.comcom.2016.03.015
Branitskiy A.A., Kotenko I.V. Analysis and classification of methods for network attack detection. SPIIRAS Proceedings, 2016, no. 2(45), pp. 207–244. (in Russian). https://doi.org/10.15622/sp.45.13
Parkinson S., Ward P., Wilson K., Miller J. Cyber threats facing autonomous and connected vehicles: Future challenges. IEEE Transactions on Intelligent Transportation Systems, 2017, vol. 18, no. 11, pp. 2898–2915. https://doi.org/10.1109/TITS.2017.2665968
Du M., Wang K., Chen Y., Wang X., Sun Y. Big data privacy preserving in multi-access edge computing for heterogeneous Internet of Things. IEEE Communications Magazine, 2018, vol. 56, no. 8, pp. 62–67. https://doi.org/10.1109/MCOM.2018.1701148
Goel S., Williams K., Dincelli E. Got phished? Internet security and human vulnerability. Journal of the Association for Information Systems, 2017, vol. 18, no. 1, pp. 22–44. https://doi.org/10.17705/1jais.00447
Abramov M.V. Automation of the social networks websites content analysis in the problems of forecasting the protection of the information systems users from social engineering attacks. Automation of Control Processes, 2018, no. 1(51), pp. 34–40. (in Russian)
Khlobystova A., Korepanova A., Maksimov A., Tulupyeva T. An approach to quantification of relationship types between users based on the frequency of combinations of non-numeric evaluations. Advances in Intelligent Systems and Computing, 2020, vol. 1156 AISC, pp. 206–213. https://doi.org/10.1007/978-3-030-50097-9_21
KhlobystovaA.O., AbramovM.V., TulupyevaT.V., TulupyevA.L. Social influence on the user in social network: types of communications in assessment of the behavioral risks connected with the socio-engineering attacks. Administrative Consulting, 2019, no. 3, pp. 104–117. (inRussian). https://doi.org/10.22394/1726-1139-2019-3-104-117
AbramovM.V., TulupevaT.V., TulupevA.L.Social Engineering Attacks: Social Networks and User Security Estimates. St. Petersburg, SUAI Publ., 2018, 266 p. (in Russian)
AzarovA.A., AbramovM.V., TulupyevaT.V., TulupyevA.L. The analysis of the information systems'' users'' groups protection analysis from the social engineering attacks: the principle and program implementation. Computer Tools in Education Journal, 2015, no. 4, pp. 52–60. (in Russian)
Krylov B., Abramov M., Khlobystova A. Automated player activity analysis for a serious game about social engineering. Studies in Systems, Decision and Control, 2020, vol. 337, pp. 587–599. https://doi.org/10.1007/978-3-030-65283-8_48
Li Y., Su Z., Yang J., Gao C. Exploiting similarities of user friendship networks across social networks for user identification. Information Sciences, 2020, vol. 506, pp. 78–98. https://doi.org/10.1016/j.ins.2019.08.022
Korepanova A.A., Oliseenko V.D., Abramov M.V. Applicability of similarity coefficients in social circle matching. International Conference on Soft Computing and Measurements, 2020, vol. 1, pp. 39–42. (in Russian)
Korepanova A.A., Oliseenko V.D., Abramov M.V., Tulupyev A.L. Application of machine learning methods in the task of identifying user accounts in two social networks. Computer Tools in Education Journal, 2019, no. 3, pp. 29–43. (in Russian). https://doi.org/10.32603/2071-2340-2019-3-29-43
Raad E., Chbeir R., Dipanda A. User profile matching in social networks. Proc. 13^th International Conference on Network-Based Information Systems (NbiS), 2010,pp. 297–304.
Schwartz H.A., Eichstaedt J.C., Kern M., Dziurzynski L., Ramones S.M., Adrawal M., Shah A., Kosinski M., Stillwell D., Seligman M.E.P., Ungar L.H. Personality, gender, and age in the language of social media: The open-vocabulary approach. PloS One, 2013, vol. 8, no. 9, pp. e73791. https://doi.org/10.1371/journal.pone.0073791
Liu S.,Wang S., Zhu F., Zhang J., Krishnan R. HYDRA: Large-scale social identity linkage via heterogeneous behavior modeling. Proc. of the ACM SIGMOD International Conference on Management of Data, 2014, pp. 51–62. https://doi.org/10.1145/2588555.2588559
Ozga F., Onnela J.-P., DeGruttola V. Bayesian method for inferring the impact of geographical distance on intensity of communication. Scientific Reports, 2020, vol. 10, no. 1, pp. 11775. https://doi.org/10.1038/s41598-020-68583-1
Sokhin T., Butakov N., Nasonov D. User profiles matching for different social networks based on faces identification. Lecture Notes in Computer Science, 2019, vol. 11734, pp. 551–562. https://doi.org/10.1007/978-3-030-29859-3_47
Oh S.J., Benenson R., Fritz M., Schiele B. Person recognition in personal photo collections. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, vol. 42, no. 1, pp. 203–220. https://doi.org/10.1109/TPAMI.2018.2877588
Ranaldi L., Zanzotto F.M. Hiding Your Face Is Not Enough: user identity linkage with image recognition. Social Network Analysis and Mining, 2020, vol. 10, no. 1, pp. 56. https://doi.org/10.1007/s13278-020-00673-4
Marr D., Hildreth E. Theory of edge detection. Proceedings of the Royal Society of London. Series B. Biological Sciences, 1980, vol. 207, no. 1167, pp. 187–217. https://doi.org/10.1098/rspb.1980.0020
Rudakov I.V., Vasiutovich I.M. Analysis of perceptual image hash functions. Science and Education of the Bauman MSTU, 2015, no. 8, pp. 269–280. (in Russian). https://doi.org/10.7463/0815.0800596
Zauner C. Implementation and benchmarking of perceptual image hash functions. Master’s thesis. 2010, 94 p.
Zauner C., Steinebach M., Hermann E. Rihamark: Perceptual image hash benchmarking. Proceedings of SPIE, 2011, vol. 7880, pp. 78800X. https://doi.org/10.1117/12.876617
Oliva A., Torralba A. Modeling the shape of the scene: a holistic representation of the spatial envelope. International Journal of Computer Vision, 2001, vol. 42, no. 3, pp. 145–175. https://doi.org/10.1023/A:1011139631724
Wang X., Zhou X., Zhang Q., Xu B., Xue J. Image alignment based perceptual image hash for content authentication. Signal Processing: Image Communication, 2020, vol. 80, pp. 115642. https://doi.org/10.1016/j.image.2019.115642
Tuncer T., Dogan S., Abdar M., Pławiak P. A novel facial image recognition method based on perceptual hash using quintet triple binary pattern. Multimedia Tools and Applications, 2020, vol. 79, no. 39-40, pp. 29573–29593. https://doi.org/10.1007/s11042-020-09439-8
GruzmanI.S., KirichukV.S., KosykhV.P., PeretiaginG.I., SpektorA.A.Digital Image Processing in Information Systems. Novosibirsk, NSTU Publ., 2002, 352 p. (in Russian)
Hamming R.W. Error detecting and error correcting codes. Bell System Technical Journal, 1950, vol. 29, no. 2, pp. 147–160. https://doi.org/10.1002/j.1538-7305.1950.tb00463.x
Cook J., Ramadas V. When to consult precision-recall curves. Stata Journal, 2020, vol. 20, no. 1, pp. 131–148. https://doi.org/10.1177/1536867X20909693

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License