doi: 10.17586/2226-1494-2025-25-3-487-497


Method for identifying the active module in biological graphs with multi-component vertex weights

D. A. Usoltsev, I. I. Molotkov, M. N. Artomov, A. A. Sergushichev, A. A. Shalyto


Read the full article  ';
Article in Russian

For citation:
Usoltsev D.A., Molotkov I.I., Artomov M.N., Sergushichev A.A., Shalyto A.A. Method for identifying the active module in biological graphs with multi-component vertex weights. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2025, vol. 25, no. 3, pp. 487–497 (in Russian). doi: 10.17586/2226-1494-2025-25-3-487-497


Abstract
An active module in biological graphs is a connected subgraph whose vertices share a common biological function. To identify an active module, one must first construct a weighted biological graph. The weight of each vertex is calculated based on biological experiments investigating the target biological function. However, the results of a single experiment may not fully describe the desired active module, covering only part of it and potentially introducing uncertainty into the vertex weights. This work demonstrates that employing Fisher’s method to integrate data from multiple experiments followed by applying a Markov chain Monte Carlo (MCMC) and machine learning–based approach to the results of Fisher’s method, enables more effective identification of active modules in biological graphs. The study utilizes the InWebIM protein–protein interaction graph, a human brain reconstruction graph from the BigBrain project, and a gene graph for the organism Caenorhabditis elegans. To combine the results of several experiments into a single outcome within one graph, Fisher’s method is applied. Afterwards, the search for active modules is conducted using an MCMC and machine learning-based method. To validate the proposed method on real data, results from Genome- Wide Association Studies on schizophrenia and smoking are used, along with the gene expression matrix of patients with skin melanoma from the TCGA project. Applying Fisher’s method makes it possible to consider the results of multiple biological experiments simultaneously. Subsequent use of the MCMC and machine learning–based method improves the accuracy of identifying active modules compared to ranking graph vertices solely by Fisher’s method. Considering the results of multiple biological experiments when determining active modules plays a crucial role in increasing the accuracy of identifying the vertices of the active module. This, in turn, promotes a deeper understanding of the biological mechanisms of diseases, which can be of great significance for the development of new diagnostic and therapeutic methods.

Keywords: graphs, MCMC, Fisher’s method, biological graphs, active module

References
  1. Wang S., Wu R., Lu J., Jiang Y., Huang T., Cai Y.D. Protein-protein interaction networks as miners of biological discovery. Proteomics, 2022, vol. 22, no. 15-16, P. e2100190. https://doi.org/10.1002/pmic.202100190
  2. Rao X., Dixon R.A. Co-expression networks for plant biology: why and how. Acta Biochimica et Biophysica Sinica, 2019, vol. 51,no. 10, pp. 981–988. https://doi.org/10.1093/abbs/gmz080
  3. Rawls K., Dougherty B.V., Papin J. Metabolic network reconstructions to predict drug targets and off-target effects. Methods in Molecular Biology, 2020, vol. 2088, pp. 315–330. https://doi.org/10.1007/978-1-0716-0159-4_14
  4. Dittrich M.T., Klau G.W., Rosenwald A., Dandekar T., Müller T. Identifying functional modules in protein-protein interaction networks: an integrated exact approach. Bioinformatics, 2008, vol. 24, no. 13. pp. i223–i231. https://doi.org/10.1093/bioinformatics/btn161
  5. Zhu Q.M., Hsu Y.H.H., Lassen F.H., MacDonald B.T., Stead S., Malolepsza E., Kim A., Li T., Mizoguchi T., Schenone M., Guzman G., Tanenbaum B., Fornelos N., Carr S.A., Gupta R.M., Ellinor P.T., Lage K. Protein interaction networks in the vasculature prioritize genes and pathways underlying coronary artery disease. Communications Biology, 2024, vol. 7, no. 1, pp. 87. https://doi.org/10.1038/s42003-023-05705-1
  6. Nehme R., Pietiläinen O., Artomov M., Tegtmeyer M., Valakh V., Lehtonen L., Bell C., Singh T., Trehan A., Sherwood J. et. al. The 22q11.2 region regulates presynaptic gene-products linked to schizophrenia. Nature Communications, 2022, vol. 13, no. 1, pp. 3690. https://doi.org/10.1038/s41467-022-31436-8
  7. Alexeev N., Isomurodov J., Sukhov V., Korotkevich G., Sergushichev A. Markov chain Monte Carlo for active module identification problem. BMC Bioinformatics, 2020, vol. 21, Suppl. 6, pp. 261. https://doi.org/10.1186/s12859-020-03572-9
  8. Usoltsev D.A., Molotkov I.I., Artomov M.N., Sergushichev A.A., Shalyto A.A. Application of Markov chain Monte Carlo and machine learning for identifying active modules in biological graphs. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2024, vol. 24, no. 6, pp. 962–971. (in Russian). https://doi.org/10.17586/2226-1494-2024-24-6-962-971
  9. Kim T.K., Park J.H. More about the basic assumptions of t-test: normality and sample size. Korean Journal of Anesthesiology, 2019, vol. 72, no. 4, pp. 331–335. https://doi.org/10.4097/kja.d.18.00292
  10. Pounds S., Morris S.W. Estimating the occurrence of false positives and false negatives in microarray studies by approximating and partitioning the empirical distribution of p-values. Bioinformatics, 2003, vol. 19, no. 10, pp. 1236–1242. https://doi.org/10.1093/bioinformatics/btg148
  11. Ham H., Park T. Combiningp-values from various statistical methods for microbiome data. Frontiers in Microbiology, 2022, vol. 13, pp. 990870. https://doi.org/10.3389/fmicb.2022.990870
  12. Li T., Wernersson R., Hansen R.B., Horn H., Mercer J., Slodkowicz G., Workman C.T., Rigina O., Rapacki K., Stærfeldt H.H., Brunak S., Jensen T.S., Lage K. A scored human protein-protein interaction network to catalyze genomic interpretation. Nature Methods, 2017, vol. 14, no. 1, pp. 61–64. https://doi.org/10.1038/nmeth.4083
  13. Rossi R., Ahmed N. The network data repository with interactive graph analytics and visualization. Proc.of the 29th AAAI Conference on Artificial Intelligence, 2015, vol. 29, no. 1. https://doi.org/10.1609/aaai.v29i1.9277
  14. Amunts K., Lepage C., Borgeat L., Mohlberg H., Dickscheid T., Rousseau M.É., Bludau S., Bazin P.L., Lewis L.B., Oros-Peusquens A.M., Shah N.J., Lippert T., Zilles K., Evans A.C. BigBrain: an ultrahigh-resolution 3D human brain model. Science, 2013, vol. 340, no. 6139, pp. 1472–1475. https://doi.org/10.1126/science.1235381
  15. Cho A., Shin J., Hwang S., Kim C., Shim H., Kim H., Kim H., Lee I. WormNet v3: a network-assisted hypothesis-generating server for Caenorhabditis elegansNucleic Acids Research, 2014, vol. 42, no. W1, pp. W76–W82. https://doi.org/10.1093/nar/gku367
  16. Zhu Z., Zhang F., Hu H., Bakshi A., Robinson M.R., Powell J.E., Montgomery G.W., Goddard M.E., Wray N.R., Visscher P.M., Yang J. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nature Genetics, 2016, vol. 48, no. 5, pp. 481–487. https://doi.org/10.1038/ng.3538
  17. Usoltsev D., Molotkov I., Artomov M. A meta-predictor for causal gene identification in GWAS overcomes limitations of existing computational approaches. American Society of Human Genetics (Complex Traits and Polygenic Disorders Poster Friday Session), 2024.
  18. Pardiñas A.F., Holmans P., Pocklington A.J., Escott-Price V., Ripke S., Carrera N., Legge S.E., Bishop S., Cameron D., Hamshere M.L., et. al. Common schizophrenia alleles are enriched in mutation-intolerant genes and in regions under strong background selection. Nature Genetics, 2018, vol. 50, no. 3, pp. 381–389. https://doi.org/10.1038/s41588-018-0059-2
  19. Barbeira A.N., Dickinson S.P., Bonazzola R., Zheng J., Wheeler H.E., Torres J.M., Torstenson E.S., Shah K.P., Garcia T., Edwards T.L., Stahl E.A., Huckins L.M., Nicolae D.L., Cox N.J., Im H.K. Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics.Nature Communications, 2018, vol. 9,no. 1, pp. 1825. https://doi.org/10.1038/s41467-018-03621-1
  20. Urbut S.M., Wang G., Carbonetto P., Stephens M. Flexible statistical methods for estimating and testing effects in genomic studies with multiple conditions. Nature Genetics, 2019, vol. 51, no. 1, pp. 187–195. https://doi.org/10.1038/s41588-018-0268-8
  21. Kolosov N., Daly M.J., Artomov M. Prioritization of disease genes from GWAS using ensemble-based positive-unlabeled learning. European Journal of Human Genetics, 2021, vol. 29, no. 10, pp. 1527–1535. https://doi.org/10.1038/s41431-021-00930-w
  22. Lam M., Chen C-Y., Li Z., Martin A.R., Bryois J., Ma X., Gaspar H.,Ikeda M., Benyamin B., Brown B.C. et al. Comparative genetic architectures of schizophrenia in East Asian and European populations. Nature Genetics, 2019, vol. 51, no. 12, pp. 1670–1678. https://doi.org/10.1038/s41588-019-0512-x
  23. Singh T., Poterba T., Curtis D., Akil H., Al Eissa M., Barchas J.D., Bass N., Bigdeli T.B., Breen G., Bromet E.J.et al.Rare coding variants in ten genes confer substantial risk for schizophrenia. Nature, 2022, vol. 604, no. 7906, pp. 509–516. https://doi.org/10.1038/s41586-022-04556-w
  24. Usoltsev D., Kolosov N., Rotar O., Loboda A., Boyarinova M., Moguchaya E., Kolesova E., Erina A., Tolkunova K., Rezapova V., Molotkov I. et al. Complex trait susceptibilities and population diversity in a sample of 4,145 Russians. Nature Communications, 2024, vol. 15, no. 1, pp. 6212. https://doi.org/10.1038/s41467-024-50304-1
  25. Usoltsev D., Njauw C.N., Ji Z., Kumar R., Sergushichev A., Zhang S., Shlyakhto E., Daly M.J., Artomov M., Tsao H. Analysis of variants induced by combined ex vivo irradiation and in vivo tumorigenesis suggests a role for the ZNF831 p. R1393Q variantin cutaneous melanoma development. Journal of Investigative Dermatology, 2024, in Press, corrected proof. https://doi.org/10.1016/j.jid.2024.08.042
  26. Loboda A.A. A method of graphical clustering for joint analysis of genotyping and expression data. Dissertation for the degree of candidate of technical sciences. St. Petersburg, 2022, 232 p. (in Russian)
  27. Subramanian A., Tamayo P., Mootha V.K., Mukherjee S., Ebert B.L., Gillette M.A., Paulovich A., Pomeroy S.L., Golub T.R., Lander E.S., Mesirov J.P. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc.of the National Academy of Sciencesof the United States of America, 2005, vol. 102, no. 43, pp. 15545–15550. https://doi.org/10.1073/pnas.0506580102


Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License
Copyright 2001-2025 ©
Scientific and Technical Journal
of Information Technologies, Mechanics and Optics.
All rights reserved.

Яндекс.Метрика