**Nikiforov**

Vladimir O.

D.Sc., Prof.

Vladimir O.

D.Sc., Prof.

doi: 10.17586/2226-1494-2016-16-1-122-132

doi: 10.17586/2226-1494-2016-16-1-122-132

# SYNTHESIS OF THE SECONDARY STRUCTURE OF ALGEBRAIC BAYESIAN NETWORKS: AN INCREMENTAL ALGORITHM AND STATISTICAL ESTIMATION OF ITS COMPLEXITY

**Read the full article**';

**Article in**Russian

**For citation:**Zotov M.A., Levenets D.G., Tulupyev A.L., Zolotin A.A. Synthesis of the secondary structure of algebraic Bayesian networks: an incremental algorithm and statistical estimation of its complexity.

*Scientific and Technical Journal of Information Technologies, Mechanics and Optics*, 2016, vol. 16, no. 1, pp. 122–132.

**Abstract**

An improved algorithm for the synthesis of the secondary structure of algebraic Bayesian networks represented by a minimal join graph is proposed in the paper. The algorithm differs from the previously offered one so that it relies on the incremental principle, uses specially selected edges and, finally, eliminates redundant edges by a greedy algorithm. The correct operation of the incremental algorithm is mathematically proved. Comparison of the computational complexity of the new (incremental) algorithm implementation and two well-known (greedy and direct) is made by means of statistical estimates of complexity, based on the sample values of the runtime ratio of software implementations of two compared algorithms. Theoretical complexity estimates of the greedy and direct algorithms have been obtained earlier, but are not suitable for comparative analysis, as they are based on the hidden characteristics of the secondary structure, which can be calculated only when it is built. To minimize the influence of random factors at calculating the ratio average program runtime is used obtained by N launches on the same set of workloads. The sample values of ratio is formed for M sets of equal power K. According to the sample values the median is calculated, as well as the other statistics that characterize the spread: borders of the 97% confidence interval along with the first and the third quartiles. Sets of loads are stochastically generated according to the specified parameters using the algorithm described in the paper. The stochastic algorithms generating a set of loads with given power, as well as collecting the statistical data and calculating of statistical estimates of the ratio of forward and greedy algorithm to the incremental algorithm runtimes is described in the paper. A series of experiments is carried out in which N is changed in the range 1, 2 ... 9, 10, 26, 42 ... 170.They have showed that the incremental algorithm speed exceeds the forward and greedy ones, moreover in the 10-170 load sets power range this finding is statistically significant (97% level). The results of experiments are visualized using a graphs library Highcharts. The developed incremental algorithm is designed for application in problems solving of algebraic Bayesian networks machine learning.

**Keywords:**Bayesian networks, secondary structure synthesis, greedy algorithm, incremental algorithm, computational complexity, statistical estimate, minimal joint graph

**Acknowledgements.**The paper contains results of research partially supported with RFBR grant No.15-01-09001 – «Combined probabilistic-logic graphical approach to representation and processing of uncertain knowledge systems: algebraical Bayesian networks and related models».

**References**

1. Wang X.D. Data mining in network engineering-Bayesian networks for data mining. *Proc. Int. Conf. on Education, Management, Commerce and Society*. Shenyang, China, 2015, vol. 17, pp. 412–417.

2. Benferhat S., Tabia K. Inference in possibilistic network classifiers under uncertain observations. *Annals of Mathematics and Artificial Intelligence,* 2012, vol. 64, no. 2–3, pp. 269–309. doi: 10.1007/s10472-012-9290-1

3. Stratford D.S., Croft K.M., Pollino C.A. Knowledge representation using Bayesian networks and ontologies. *Proc. 20th Int. Congress on Modelling and Simulation, ModSim 2013*. Adelaide, Australia, 2013, pp. 2980–2986.

4. Lockamy A., McCormack K. Modeling supplier risks using Bayesian networks.* Industrial Management & Data Systems*, 2012, vol. 112, no. 1–2, pp. 313–333. doi: 10.1108/02635571211204317

5. Axelrad E.T., Sticha P.J., Brdiczka O., Shen J.Q. A Bayesian network model for predicting insider threats. *Proc. 2nd IEEE CS Security and Privacy Workshops, SPW 2013.* San Francisco, USA, 2013, pp. 82–89. doi: 10.1109/SPW.2013.35

6. Hu Y., Zhang X.Z., Ngai E.W.T., Cai R.C., Liu M. Software project risk analysis using Bayesian networks with causality constraints. *Decision Support Systems*, 2013, vol. 56, pp. 439–449. doi: 10.1016/j.dss.2012.11.001

7. Xu C., Xu G.Q., Li X.H., Feng Z.Y., Meng Z.P. Approach to security attack pattern networks on the basis of Bayesian networks. *Applied Mathematics & Information Sciences*, 2013, vol. 7, pp. 233–241.

8. Wu X., Wen X.Y., Li J., Yao L. A new dynamic Bayesian network approach for determining effective connectivity from fMRI data.* Neural Computing & Applications*, 2014, vol. 24, no. 1, pp. 91–97. doi: 10.1007/s00521-013-1465-0

9. Liu K.F.R., Lu C.F., Chen C.W., Shen Y.S. Applying Bayesian belief networks to health risk assessment. Stochastic Environmental Research and Risk Assessment, 2012, vol. 26, no. 3, pp. 451–465. doi: 10.1007/s00477-011-0470-z

10. Forio M.A.E., Landuyt D., Bennetsen E., Lock K., Nguyen T.H.T., Ambarita M.N.D., Musonge P.L.S., Boets P., Everaert G., Dominguez-Granda L., Goethals P.L.M. Bayesian belief network models to analyse and predict ecological water quality in rivers. *Ecological Modelling*, 2015, vol. 312, pp. 222–238. doi: 10.1016/j.ecolmodel.2015.05.025

11. Murray J.V., Stokes K.E., van Klinken R.D. Predicting the potential distribution of a riparian invasive plant: the effects of changing climate, flood regimes and land-use patterns. *Global Change Biology,* 2012, vol. 18, no. 5, pp. 1738–1753. doi: 10.1111/j.1365-2486.2011.02621.x

12. Francis R.A., Guikema S.D., Henneman L. Bayesian belief networks for predicting drinking water distribution system pipe breaks. *Reliability Engineering & System Safety*, 2014, vol. 130, pp. 1–11. doi: 10.1016/j.ress.2014.04.024

13. Mittal S., Gopal K., Maskara S.L. A novel Bayesian belief network structure learning algorithm based on bio-inspired monkey search meta heuristic. *Proc. 2014 7th Int. Conf. on Contemporary Computing, IC3.* Noida, India, 2014, pp. 141–147.

14. Li S.H., Zhang J., Sun B.L., Lei J. An incremental structure learning approach for Bayesian network. *Proc. 26th Chinese Control and Decision Conference, 2014 CCDC*. Changsha, China, 2014, pp. 4817–4822.

15. Samet S., Miri A., Granger E. Incremental learning of privacy-preserving Bayesian networks. *Applied Soft Computing*, 2013, vol. 13, no. 8, pp. 3657–3667. doi: 10.1016/j.asoc.2013.03.011

16. Sharp A.M. Incremental Algorithms: Solving Problems in a Changing World. PhD Dissertation. Cornell University, USA, 2007, 121 p. Available at: http://www.cs.cornell.edu/~asharp/papers/thesis.pdf.

17. Tulup'ev A.L. Algebraicheskie Baiesovskie Seti: Global'nyi Logiko-Veroyatnostnyi Vyvod v Derev'yakh Smezhnosti [Algebraical Bayesian Networks: Global Logic-Probabilistic Inference in the Adjacent Tree]. St. Petersburg, SPbSU Publ., Anatolia Publ., 2007, 40 p.

18. Tulup'ev A.L., Sirotkin A.V., Nikolenko S.I. Baiesovskie Seti Doveriya: Logiko-Veroyatnostnyi Vyvod v Atsiklicheskikh Napravlennykh Grafakh [Bayesian Belief Networks: Logical-Probabilistic Inference in the Acyclic Directed Graph]. St. Petersburg, SPbSU Publ., 2009, 400 p.

19. Tulupyev A.L., Stolyarov D.M., Mentyukov M.V. A representation for local and global structures of an algebraical Bayesian network in Java applications. *SPIIRAS Proceedings*, 2007, no. 5, pp. 71–99.

20. Oparin V.V., Tulupyev A.L. Synthesis of a joint graph with minimal number of edges: an algorithm formalization and a proof of correctness. *SPIIRAS Proceedings*, 2009, no. 11, pp. 142–157.

21. Oparin V.V., Filchenkov A.A., Sirotkin A.V., Tulupyev A.L. Matroidal representation for the adjacency graphs family built on a set of knowledge patterns. *Scientific and Technical Journal of Information Technologies, Mechanics and Optics*, 2010, no. 4 (68), pp. 73–76.

22. Filchenkov A.A. Sintez Grafov Smezhnosti v Mashinnom Obuchenii Global'nykh Struktur Algebraicheskikh Baiesovskikh Setei. Dis. ... kand. fiz.-mat. nauk [Synthesis Join Graph in Machine Learning of Global Structures of Algebraic Bayesian Networks. Dis. Phys.-Math. Sci.]. Samara, 2013, 339 p.

23. Filchenkov A.A., Tulupyev A.L., Sirotkin A.V. Algebraic Bayesian network minimal join graphs: formalization of synthesis basis and automated learning. *Fuzzy Systems and Soft Computing*, 2011, vol. 6, no. 2, pp. 145–163.

24. Zotov M.A., Tulupyev A.L Secondary structure synthesis in algebraic Bayesian networks: a statistical technique for complexity estimates and comparative analysis of greedy and straightforward algorithms. *Computer Tools in Education*, 2015, no. 1, pp. 3–18.

25. Zotov M.A., Tulupyev A.L., Sirotkin A.V. Complexity statistical estimates of straightforward and greedy algorithms for algebraic Bayesian networks syntesis. *Fuzzy Systems and Soft Computing*, 2015, vol. 10, no. 1, pp. 75–91.

26. Konstantinou N., Spanos D.E., Kouis D., Mitrou N. An approach for the incremental export of relational databases into RDF graphs. *International Journal on Artificial Intelligence Tools*, 2015, vol. 24, no. 2, art. 1540013. doi: 10.1142/S0218213015400138

27. Wycislik L., Warchal L. A performance comparison of several common computation tasks used in social network analysis performed on graph and relational databases. M*an-Machine Interactions 3*, 2014, vol. 242, pp. 651–659. doi: 10.1007/978-3-319-02309-0_70

28. Gao J., Zhou J.S., Yu J.X., Wang T.J. Shortest path computing in relational DBMSs. *IEEE Transactions on Knowledge and Data Engineering*, 2014, vol. 26, no. 4, pp. 997–1011. doi: 10.1109/TKDE.2013.43

29. Chen R.W. Managing massive graphs in relational DBMS. *Proc. IEEE Int. Conf. on Big Data.* Santa Clara, 2013, pp. 1–8.

30. Park J., Lee S.G. A graph-theoretic approach to optimize keyword queries in relational databases. *Knowledge and Information Systems*, 2014, vol. 41, no. 3, pp. 843–870. doi: 10.1007/s10115-013-0690-2