doi: 10.17586/2226-1494-2024-24-6-962-971


Application of Markov chain Monte Carlo and machine learning for identifying active modules in biological graphs

D. A. Usoltsev, I. I. Molotkov, M. N. Artomov, A. A. Sergushichev, A. A. Shalyto


Read the full article  ';
Article in Russian

For citation:
Usoltsev D.A., Molotkov I.I., Artomov M.N., Sergushichev A.A., Shalyto A.A. Application of Markov chain Monte Carlo and machine learning for identifying active modules in biological graphs. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2024, vol. 24, no. 6, pp. 962–971 (in Russian). doi: 10.17586/2226-1494-2024-24-6-962-971


Abstract
In biology, information about interactions between the proteins or genes under study can be represented as a biological graph. A connected subgraph, whose vertices perform a common biological function, is called an active module. The Markov Chain Monte Carlo (MCMC) algorithm is an effective method for identifying active modules in biological graphs. In the context of protein-protein interactions, accurately identifying the active module allows for determining which protein function disruption leads to certain changes (e.g., diseases) in a biological system (cell/organism). This study demonstrates that applying MCMC in combination with models (that take graph topology into account) provides higher accuracy in identifying the active module. This study independently utilizes a protein-protein interaction graph (InWebIM) and the GeneMANIA functional association network for training the model and comparing it with the known MCMC-based method. To search for the active module, a combination of MCMC and a machine learning method, gradient boosting (xgboost), was employed. The combined use of the MCMC-based method and gradient boosting improves the accuracy of active module identification compared to the MCMC-based method alone on simulated data. Improving the accuracy of active module identification is crucial for studying the biological mechanisms of diseases and discovering individual proteins functionally associated with the development of diseases.

Keywords: graphs, machine learning, protein networks, MCMC, active module

References
  1. Huber W., Carey V.J., Long L., Falcon S., Gentleman R. Graphs in molecular biology. BMC Bioinformatics, 2007, vol. 8, suppl. 6, pp. S8. https://doi.org/10.1186/1471-2105-8-S6-S8
  2. Szczepanski A.P., Wang L. Emerging multifaceted roles of BAP1 complexes in biological processes. Cell Death Discovery, 2021, vol. 7, no. 1, pp. 20. https://doi.org/10.1038/s41420-021-00406-2
  3. Carbone M., Yang H., Pass H.I., Krausz T., Testa J.R., Gaudino G. BAP1 and cancer. Nature Reviews Cancer, 2013, vol. 13, no. 3, pp. 153–159. https://doi.org/10.1038/nrc3459
  4. Lin J.S., Lai E.M. Protein-protein interactions: Co-Immunoprecipitation. Methods in Molecular Biology, 2017, vol. 1615, pp. 211–219. https://doi.org/10.1007/978-1-4939-7033-9_17
  5. Tamara S., den Boer M.A., Heck A.J.R. High-resolution native mass spectrometry. Chemical Reviews, 2022, vol. 122, no. 8, pp. 7269–7326. https://doi.org/10.1021/acs.chemrev.1c00212
  6. Okpara M.O., Hermann C., van der Watt P.J., Garnett S., Blackburn J.M., Leaner V.D. A mass spectrometry-based approach for the identification of Kpnβ1 binding partners in cancer cells. Scientific Reports, 2022, vol. 12, no. 1, pp. 20171. https://doi.org/10.1038/s41598-022-24194-6
  7. Li T., Wernersson R., Hansen R.B., Horn H., Mercer J., Slodkowicz G., Workman C.T., Rigina O., Rapacki K., Stærfeldt H.H., Brunak S., Jensen T.S., Lage K. A scored human protein-protein interaction network to catalyze genomic interpretation. Nature Methods, 2017, vol. 14, no. 1, pp. 61–64. https://doi.org/10.1038/nmeth.4083
  8. Zhu Q.M., Hsu Y.H., Lassen F.H., MacDonald B.T., Stead S., Malolepsza E., Kim A., Li T., Mizoguchi T., Schenone M., Guzman G., Tanenbaum B., Fornelos N., Carr S.A., Gupta R.M., Ellinor P.T., Lage K. Protein interaction networks in the vasculature prioritize genes and pathways underlying coronary artery disease. Communications Biology, 2024, vol. 7, no. 1, pp. 87.https://doi.org/10.1038/s42003-023-05705-1
  9. Nehme R., Pietiläinen O., Artomov M., Tegtmeyer M., Valakh V., Lehtonen L., Bell C., Singh T., Trehan A., Sherwood J., Manning D., Peirent E., Malik R., Guss E.J., Hawes D., Beccard A., Bara A.M., Hazelbaker D.Z., Zuccaro E., Genovese G., Loboda A.A., Neumann A., Lilliehook C., Kuismin O., Hamalainen E., Kurki M., Hultman C.M., Kähler A.K., Paulo J.A., Ganna A., Madison J., Cohen B., McPhie D., Adolfsson R., Perlis R., Dolmetsch R., Farhi S., McCarroll S., Hyman S., Neale B., Barrett L.E., Harper W., Palotie A., Daly M., Eggan K. The 22q11.2 region regulates presynaptic gene-products linked to schizophrenia. Nature Communications, 2022, vol. 13, no. 1, pp. 3690. https://doi.org/10.1038/s41467-022-31436-8
  10. Nguyen H., Shrestha S., Tran D., Shafi A., Draghici S., Nguyen T. A Comprehensive survey of tools and software for active subnetwork identification. Frontiers in Genetics, 2019, vol. 10, pp. 155. https://doi.org/10.3389/fgene.2019.00155
  11. Mitra K., Carvunis A.R., Ramesh S.K., Ideker T. Integrative approaches for finding modular structure in biological networks. Nature Reviews Genetics, 2013, vol. 14, no. 10, pp. 719–732. https://doi.org/10.1038/nrg3552
  12. Strauss B.S. Biochemical genetics and molecular biology: The contributions of George Beadle and Edward Tatum. Genetics, 2016, vol. 203, no. 1, pp. 13–20. https://doi.org/10.1534/genetics.116.188995
  13. Montecino-Rodriguez E., Casero D., Fice M., Le J., Dorshkind K. Differential expression of PU.1 and key T lineage transcription factors distinguishes fetal and adult T cell development. Journal of Immunology, 2018, vol. 200, no. 6, pp. 2046–2056. https://doi.org/10.4049/jimmunol.1701336
  14. Suzuki K., Hatzikotoulas K., Southam L., Taylor H.J., Yin X., Lorenz K.M. et al. Genetic drivers of heterogeneity in type 2 diabetes pathophysiology. Nature, 2024, vol. 627, pp. 347–357. https://doi.org/10.1038/s41586-024-07019-6
  15. Kim T.K., Park J.H. More about the basic assumptions of t-test: normality and sample size. Korean Journal of Anesthesiology, 2019, vol. 72, no. 4, pp. 331–335. https://doi.org/10.4097/kja.d.18.00292
  16. Barton S.J., Crozier S.R., Lillycrop K.A., Godfrey K.M., Inskip H.M. Correction of unexpected distributions of P values from analysis of whole genome arrays by rectifying violation of statistical assumptions. BMC Genomics, 2013, no. 14, pp. 161. https://doi.org/10.1186/1471-2164-14-161
  17. Alexeev N., Isomurodov J., Sukhov V., Korotkevich G., Sergushichev A. Markov chain Monte Carlo for active module identification problem. BMC Bioinformatics, 2020, vol. 21, suppl. 6, pp. 261. https://doi.org/10.1186/s12859-020-03572-9
  18. Dittrich M.T., Klau G.W., Rosenwald A., Dandekar T., Müller T. Identifying functional modules in protein-protein interaction networks: an integrated exact approach. Bioinformatics, 2008, vol. 24, no. 13, pp. i223–i231. https://doi.org/10.1093/bioinformatics/btn161
  19. Zhu Z., Zhang F., Hu H., Bakshi A., Robinson M.R., Powell J.E., Montgomery G.W., Goddard M.E., Wray N.R., Visscher P.M., Yang J. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nature Genetics, 2016, vol. 48, no. 5, pp. 481–487. https://doi.org/10.1038/ng.3538
  20. Chen T., Guestrin C. XGBoost: A scalable tree boosting system. Proc. of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pp. 785–794. https://doi.org/10.1145/2939672.2939785
  21. Warde-Farley D., Donaldson S.L., Comes O., Zuberi K., Badrawi R., Chao P., Franz M., Grouios C., Kazi F., Lopes C.T., Maitland A., Mostafavi S., Montojo J., Shao Q., Wright G., Bader G.D., Morris Q. The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function. Nucleic Acids Research, 2010, vol. 38, suppl. 2, pp. W214–W220. https://doi.org/10.1093/nar/gkq537


Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License
Copyright 2001-2025 ©
Scientific and Technical Journal
of Information Technologies, Mechanics and Optics.
All rights reserved.

Яндекс.Метрика