doi: 10.17586/2226-1494-2021-21-5-727-737


Machine learning of the Bayesian belief network as a tool for evaluating the process frequency on social network data

A. V. Toropova, M. V. Abramov, T. V. Tulupyeva


Read the full article  ';
Article in Russian

For citation:
Toropova A.V., Abramov M.V., Tulupyeva T.V. Machine learning of the Bayesian belief network as a tool for evaluating the process frequency on social network data. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2021, vol. 21, no. 5, pp. 727–737 (in Russian). doi: 10.17586/2226-1494-2021-21-5-727-737


Abstract
The paper considers the problem of evaluating frequency of the processes whose mathematical model is stochastic processes consisting of a series of sequential episodes with a known class of distributions of the length of the time interval between them. In the previously proposed approach, the input data included information about the value of the interval between the last episode and the end of the study period, which could lead to inaccurate results. This interval differs from the intervals between successive episodes, and hence its presentation and processing require approaches that take this feature into account. Accuracy of the estimation results for process frequency was improved by developing a new model based on the Bayesian confidence network that includes nodes corresponding to the intervals between the last episodes of the process, the minimum and maximum intervals between episodes, by correctly accounting for the values of the interval between the last episode and the end of the study period at the model training stage. The authors propose a Bayesian belief network that includes a random element characterizing the interval between the end of the study period and the last episode of the process during the study period; data on this interval can be available at the training stage. They used R programming and the bnlearn package to model the Bayesian belief network. A new approach to the estimation of process frequency based on the Bayesian belief network generated by machine learning methods is proposed. It allows increasing the accuracy of the results by correctly considering the value of the interval between the last episode and the end of the period under study using a special scheme in the machine learning Bayesian belief network which includes a “hypothetical” episode after the end of the study period. To test the proposed approach, data was collected on 5608 Instagram users, which included the time of posting for the year 2020 and the time of publishing the first post for the year 2021. 70 % of the sample was used to train the model, and 30 % was used to compare the posting frequency values predicted by the model with known values. The results can be used in various fields of science, where it is necessary to estimate a process frequency under information deficit, when the whole process is observed for no more than some limited time. Obtaining such estimates is often an important issue in medicine, epidemiology, sociology, etc. The approach shows good results on the comparison of the theoretical model and the results of learning from the social network data, which can automate the obtaining of process frequency estimates.

Keywords: process frequency, frequency estimation, Bayesian belief networks, process episodes, stochastic process

Acknowledgements. The work was carried out within the framework of the project under the state assignment of St. Petersburg Federal Research Center of the Russian Academy of Sciences No. 0073-2019-0003 and with financial support from the Russian Foundation for Basic Research, projects No. 19-37-90120, No. 18-01-00626 and No. 20-07-00839.

References
1. Suvorova A.V., Tulupyev A.L. Bayesian belief network structure synthesis for risky behavior rate estimation. Informatsionno-Upravliaiushchie Sistemy, 2018, no. 1, pp. 116–122. (in Russian). https://doi.org/10.15217/issn1684-8853.2018.1.116
2. Conners E.E., West B.S., Roth A.M., Meckel-Parker K.G., Kwan M.-P., Magis-Rodriguez C., Staines-Orozco H., Clapp J.D., Brouwer K.C. Quantitative, qualitative and geospatial methods to characterize HIV risk environments. PLoS ONE, 2016, vol. 11, no. 5, pp. e0155693 https://doi.org/10.1371/journal.pone.0155693
3. Abramov M.V., Tulupeva T.V., Tulupev A.L. Social Engineering Attacks: Social Media and Users Security Estimates. St. Petersburg, SUAI, 2018, 266 p. (in Russian)
4. Skinner B.F. Science and Human Behavior. Free Press, 1965, 461 p.
5. Suvorova A.V. Socially significant behavior modeling on the base of super-short incomplete set of observations. Information-measuring and Control Systems, 2013, vol. 11, no. 9, pp. 34–37. (in Russian)
6. Toropova A.V., Suvorova A.V., Tulupyev A.L. Model for socially significant behavior rate estimate: consistency diagnostics. Fuzzy Systems and Soft Computing, 2015, vol. 10, no. 1, pp. 93–107. (in Russian)
7. Friman P.C. Cooper, heron, and heward's applied behavior analysis (2nd edition): Checkered flag for students and professors, Yellow flag for the field. Journal of Applied Behavior Analysis, 2010, vol. 43, no. 1, pp. 161–174. https://doi.org/10.1901/jaba.2010.43-161
8. Bolger N., Davis A., Rafaeli E. Diary methods: capturing life as it is lived. Annual Review of Psychology, 2003, vol. 54, pp. 579–616. https://doi.org/10.1146/annurev.psych.54.101601.145030
9. Graham C.A., Catania J.A., Brand R., Duong T., Canchola J.A. Recalling sexual behavior: A methodological analysis of memory recall bias via interview using the diary as the gold standard. Journal of Sex Research, 2003, vol. 40, no. 4, pp. 325–332. https://doi.org/10.1080/00224490209552198
10. Kuleshov S., Zaytseva A., Aksenov A. Natural language search and associative-ontology matching algorithms based on graph representation of texts. Advances in Intelligent Systems and Computing, 2019, vol. 1046, pp. 285–294. https://doi.org/10.1007/978-3-030-30329-7_26
11. Tulupev A.L., Sirotkin A.V., Nikolenko S.I. Bayesian Belief Networks. St. Petersburg, St Petersburg University Publ., 2009, 399 p. (in Russian)
12. Dai J., Ren J., Du W. Decomposition-based Bayesian network structure learning algorithm using local topology information. Knowledge-Based Systems, 2020, vol. 195, pp. 105602. https://doi.org/10.1016/j.knosys.2020.105602
13. Bareinboim E., Pearl J. Causal inference and the data-fusion problem. Proceedings of the National Academy of Sciences of the United States of America, 2016, vol. 113, no. 27, pp. 7345–7352. https://doi.org/10.1073/pnas.1510507113
14. Chen C., Zhang L., Tiong R.L.K. A novel learning cloud Bayesian network for risk measurement. Applied Soft Computing Journal, 2020, vol. 87, pp. 105947. https://doi.org/10.1016/j.asoc.2019.105947
15. Cobb B.R., Li L. Bayesian network model for quality control with categorical attribute data. Applied Soft Computing Journal, 2019, vol. 84, pp. 105746. https://doi.org/10.1016/j.asoc.2019.105746
16. He R., Tian J., Wu H. Structure learning in Bayesian networks of a moderate size by efficient sampling. Journal of Machine Learning Research, 2016, vol. 17, pp. 1–54.
17. Kabir G., Demissie G., Sadiq R., Tesfamariam S. Integrating failure prediction models for water mains: Bayesian belief network based data fusion. Knowledge-Based Systems, 2015, vol. 85, pp. 159–169. https://doi.org/10.1016/j.knosys.2015.05.002
18. Toropova A., Tulupyeva T. Synthesis and learning of socially significant behavior model with hidden variables. Advances in Intelligent Systems and Computing, 2019, vol. 875, pp. 76–84. https://doi.org/10.1007/978-3-030-01821-4_9
19. Jabeen S., Kausar R. Obsessive compulsive disorder: frequency and gender estimates. Pakistan Journal of Medical Sciences, 2020, vol. 36, no. 5, pp. 1048–1052. https://doi.org/10.12669/pjms.36.5.1870
20. Kugeler K.J., Schwartz A.M., Delorey M.J., Mead P.S., Hinckley A.F. Estimating the frequency of lyme disease diagnoses, United States, 2010–2018. Emerging Infectious Diseases, 2021, vol. 27, no. 2, pp. 616–619. https://doi.org/10.3201/eid2702.202731
21. Wolfson J.A., Ishikawa Y., Hosokawa C., Janisch K., Massa J., Eisenberg D.M. Gender differences in global estimates of cooking frequency prior to COVID-19. Appetite, 2021, vol. 161, pp. 105117. https://doi.org/10.1016/j.appet.2021.105117
22. Cano-Lozano M.C., León S.P., Contreras L. Child-to-Parent violence: examining the frequency and reasons in spanish youth. Family Relations, 2021, in press. https://doi.org/10.1111/fare.12567
23. Nieto-García M., Muñoz-Gallego P.A., Gonzalez-Benito Ó. The more the merrier? Understanding how travel frequency shapes willingness to pay. Cornell Hospitality Quarterly, 2020, vol. 61, no. 4, pp. 401–415. https://doi.org/10.1177/1938965519899932
24. Zelterman D., Tulupyev A.L., Suvorova A.V., Paschenko A.E., Musina V.F., Tulupyeva T.V., Krasnoselskikh T.V., Grau L.E., Heimer R. Processing length bias of time intervals between the last episode and the interview in gamma-poisson models of behavior. SPIIRAS Proceedings, 2011, no. 1, pp. 160–185. (in Russian). https://doi.org/10.15622/sp.16.6
25. Stepanov D.V., Musina V.F., Suvorova A.V., Tulupyev A.L., Sirotkin A.V., Tulupyeva T.V. Risky behavior poisson model identification: heterogeneous arguments in likelihood. SPIIRAS Proceedings, 2012, no. 4, pp. 157–184. (in Russian). https://doi.org/10.15622/sp.23.9
26. Iarushkina N.G. Predicative analytics based on fuzzy time series. "Integrated models and soft computing in artificial intelligence". Proceedings of the 10th International Scientific and Technical Conference. V. 1. Kolomna, May 17-20, 2021. Smolensk, Universum Publ., 2021, pp. 116–128. (in Russian)
27. Özkaya U., Yiğit E., Seyfi L., Öztürk S., Singh D. Comparative regression analysis for estimating resonant frequency of c-like patch antennas. Mathematical Problems in Engineering, 2021, vol. 2021, pp. 6903925. https://doi.org/10.1155/2021/6903925
28. Osipov V.Y., Vodyaho A.I., Zhukova N.A., Glebovsky P.A. Multilevel automatic synthesis of behavioral programs for smart devices. Proc. 2017 International Conference on Control, Artificial Intelligence, Robotics ans Optimization (ICCAIRO), 2017, pp. 335–340. https://doi.org/10.1109/ICCAIRO.2017.68
29. Desmond N., Nagelkerke N., Lora W., Chipeta E., Sambo M., Kumwenda M., Corbett E.L., Taegtemeyer M., Seeley J., Lalloo D.G., Theobald S. Measuring sexual behaviour in Malawi: a triangulation of three data collection instruments. BMC Public Health, 2018, vol. 18, no. 1, pp. 807. https://doi.org/10.1186/s12889-018-5717-x
30. Suvorova A., Belyakov A., Makhamatova A., Ustinov A., Levina O., Tulupyev A., Niccolai L., Rassokhin V., Heimer R. Comparison of satisfaction with care between two different models of HIV care delivery in St. Petersburg, Russia. AIDS Care, 2015, vol. 27, no. 10, pp. 1309–1316. https://doi.org/10.1080/09540121.2015.1054337
31. Shane-Simpson C., Schwartz A.M., Abi-Habib R., Tohme P., Obeid R. I love my selfie! an investigation of overt and covert narcissism to understand selfie-posting behaviors within three geographic communities. Computers in Human Behavior, 2020, vol. 104, pp. 106158. https://doi.org/10.1016/j.chb.2019.106158
32. Chen S.X., Lam B.C.P., Hui B.P.H., Ng J.C.K., Mak W.W.S., Guan Y., Buchtel E.E., Tang W.C.S., Lau V.C.Y. Conceptualizing psychological processes in response to globalization: Components, antecedents, and consequences of global orientations. Journal of Personality and Social Psychology, 2016, vol. 110, no. 2, pp. 302–331. https://doi.org/10.1037/a0039647


Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License
Copyright 2001-2024 ©
Scientific and Technical Journal
of Information Technologies, Mechanics and Optics.
All rights reserved.

Яндекс.Метрика