Method for identifying the active module in biological graphs with multi-component vertex weights

Dmitrii A. Usoltsev, Molotkov Ivan I., Artomov Mykyta N., Sergushichev Alexey A. , Shalyto Anatoly A.

2025 , VOLUME 25, NUMBER 3 ( may-june )

ISSN 2226-1494 (print), ISSN 2500-0373 (online)

Publications

Editor-in-Chief

Nikiforov
Vladimir O.
D.Sc., Prof.

Partners

doi: 10.17586/2226-1494-2025-25-3-487-497

Method for identifying the active module in biological graphs with multi-component vertex weights

D. A. Usoltsev, I. I. Molotkov, M. N. Artomov, A. A. Sergushichev, A. A. Shalyto

Read the full article

Article in Russian

For citation:

Usoltsev D.A., Molotkov I.I., Artomov M.N., Sergushichev A.A., Shalyto A.A. Method for identifying the active module in biological graphs with multi-component vertex weights. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2025, vol. 25, no. 3, pp. 487–497 (in Russian). doi: 10.17586/2226-1494-2025-25-3-487-497

Abstract

An active module in biological graphs is a connected subgraph whose vertices share a common biological function. To identify an active module, one must first construct a weighted biological graph. The weight of each vertex is calculated based on biological experiments investigating the target biological function. However, the results of a single experiment may not fully describe the desired active module, covering only part of it and potentially introducing uncertainty into the vertex weights. This work demonstrates that employing Fisher’s method to integrate data from multiple experiments followed by applying a Markov chain Monte Carlo (MCMC) and machine learning–based approach to the results of Fisher’s method, enables more effective identification of active modules in biological graphs. The study utilizes the InWebIM protein–protein interaction graph, a human brain reconstruction graph from the BigBrain project, and a gene graph for the organism Caenorhabditis elegans. To combine the results of several experiments into a single outcome within one graph, Fisher’s method is applied. Afterwards, the search for active modules is conducted using an MCMC and machine learning-based method. To validate the proposed method on real data, results from Genome- Wide Association Studies on schizophrenia and smoking are used, along with the gene expression matrix of patients with skin melanoma from the TCGA project. Applying Fisher’s method makes it possible to consider the results of multiple biological experiments simultaneously. Subsequent use of the MCMC and machine learning–based method improves the accuracy of identifying active modules compared to ranking graph vertices solely by Fisher’s method. Considering the results of multiple biological experiments when determining active modules plays a crucial role in increasing the accuracy of identifying the vertices of the active module. This, in turn, promotes a deeper understanding of the biological mechanisms of diseases, which can be of great significance for the development of new diagnostic and therapeutic methods.

Keywords: graphs, MCMC, Fisher’s method, biological graphs, active module

References

Wang S., Wu R., Lu J., Jiang Y., Huang T., Cai Y.D. Protein-protein interaction networks as miners of biological discovery. Proteomics, 2022, vol. 22, no. 15-16, P. e2100190. https://doi.org/10.1002/pmic.202100190
Rao X., Dixon R.A. Co-expression networks for plant biology: why and how. Acta Biochimica et Biophysica Sinica, 2019, vol. 51,no. 10, pp. 981–988. https://doi.org/10.1093/abbs/gmz080
Rawls K., Dougherty B.V., Papin J. Metabolic network reconstructions to predict drug targets and off-target effects. Methods in Molecular Biology, 2020, vol. 2088, pp. 315–330. https://doi.org/10.1007/978-1-0716-0159-4_14
Dittrich M.T., Klau G.W., Rosenwald A., Dandekar T., Müller T. Identifying functional modules in protein-protein interaction networks: an integrated exact approach. Bioinformatics, 2008, vol. 24, no. 13. pp. i223–i231. https://doi.org/10.1093/bioinformatics/btn161
Zhu Q.M., Hsu Y.H.H., Lassen F.H., MacDonald B.T., Stead S., Malolepsza E., Kim A., Li T., Mizoguchi T., Schenone M., Guzman G., Tanenbaum B., Fornelos N., Carr S.A., Gupta R.M., Ellinor P.T., Lage K. Protein interaction networks in the vasculature prioritize genes and pathways underlying coronary artery disease. Communications Biology, 2024, vol. 7, no. 1, pp. 87. https://doi.org/10.1038/s42003-023-05705-1
Nehme R., Pietiläinen O., Artomov M., Tegtmeyer M., Valakh V., Lehtonen L., Bell C., Singh T., Trehan A., Sherwood J. et. al. The 22q11.2 region regulates presynaptic gene-products linked to schizophrenia. Nature Communications, 2022, vol. 13, no. 1, pp. 3690. https://doi.org/10.1038/s41467-022-31436-8
Alexeev N., Isomurodov J., Sukhov V., Korotkevich G., Sergushichev A. Markov chain Monte Carlo for active module identification problem. BMC Bioinformatics, 2020, vol. 21, Suppl. 6, pp. 261. https://doi.org/10.1186/s12859-020-03572-9
Usoltsev D.A., Molotkov I.I., Artomov M.N., Sergushichev A.A., Shalyto A.A. Application of Markov chain Monte Carlo and machine learning for identifying active modules in biological graphs. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2024, vol. 24, no. 6, pp. 962–971. (in Russian). https://doi.org/10.17586/2226-1494-2024-24-6-962-971
Kim T.K., Park J.H. More about the basic assumptions of t-test: normality and sample size. Korean Journal of Anesthesiology, 2019, vol. 72, no. 4, pp. 331–335. https://doi.org/10.4097/kja.d.18.00292
Pounds S., Morris S.W. Estimating the occurrence of false positives and false negatives in microarray studies by approximating and partitioning the empirical distribution of p-values. Bioinformatics, 2003, vol. 19, no. 10, pp. 1236–1242. https://doi.org/10.1093/bioinformatics/btg148
Ham H., Park T. Combiningp-values from various statistical methods for microbiome data. Frontiers in Microbiology, 2022, vol. 13, pp. 990870. https://doi.org/10.3389/fmicb.2022.990870
Li T., Wernersson R., Hansen R.B., Horn H., Mercer J., Slodkowicz G., Workman C.T., Rigina O., Rapacki K., Stærfeldt H.H., Brunak S., Jensen T.S., Lage K. A scored human protein-protein interaction network to catalyze genomic interpretation. Nature Methods, 2017, vol. 14, no. 1, pp. 61–64. https://doi.org/10.1038/nmeth.4083
Rossi R., Ahmed N. The network data repository with interactive graph analytics and visualization. Proc.of the 29^th AAAI Conference on Artificial Intelligence, 2015, vol. 29, no. 1. https://doi.org/10.1609/aaai.v29i1.9277
Amunts K., Lepage C., Borgeat L., Mohlberg H., Dickscheid T., Rousseau M.É., Bludau S., Bazin P.L., Lewis L.B., Oros-Peusquens A.M., Shah N.J., Lippert T., Zilles K., Evans A.C. BigBrain: an ultrahigh-resolution 3D human brain model. Science, 2013, vol. 340, no. 6139, pp. 1472–1475. https://doi.org/10.1126/science.1235381
Cho A., Shin J., Hwang S., Kim C., Shim H., Kim H., Kim H., Lee I. WormNet v3: a network-assisted hypothesis-generating server for Caenorhabditis elegans. Nucleic Acids Research, 2014, vol. 42, no. W1, pp. W76–W82. https://doi.org/10.1093/nar/gku367
Zhu Z., Zhang F., Hu H., Bakshi A., Robinson M.R., Powell J.E., Montgomery G.W., Goddard M.E., Wray N.R., Visscher P.M., Yang J. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nature Genetics, 2016, vol. 48, no. 5, pp. 481–487. https://doi.org/10.1038/ng.3538
Usoltsev D., Molotkov I., Artomov M. A meta-predictor for causal gene identification in GWAS overcomes limitations of existing computational approaches. American Society of Human Genetics (Complex Traits and Polygenic Disorders Poster Friday Session), 2024.
Pardiñas A.F., Holmans P., Pocklington A.J., Escott-Price V., Ripke S., Carrera N., Legge S.E., Bishop S., Cameron D., Hamshere M.L., et. al. Common schizophrenia alleles are enriched in mutation-intolerant genes and in regions under strong background selection. Nature Genetics, 2018, vol. 50, no. 3, pp. 381–389. https://doi.org/10.1038/s41588-018-0059-2
Barbeira A.N., Dickinson S.P., Bonazzola R., Zheng J., Wheeler H.E., Torres J.M., Torstenson E.S., Shah K.P., Garcia T., Edwards T.L., Stahl E.A., Huckins L.M., Nicolae D.L., Cox N.J., Im H.K. Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics.Nature Communications, 2018, vol. 9,no. 1, pp. 1825. https://doi.org/10.1038/s41467-018-03621-1
Urbut S.M., Wang G., Carbonetto P., Stephens M. Flexible statistical methods for estimating and testing effects in genomic studies with multiple conditions. Nature Genetics, 2019, vol. 51, no. 1, pp. 187–195. https://doi.org/10.1038/s41588-018-0268-8
Kolosov N., Daly M.J., Artomov M. Prioritization of disease genes from GWAS using ensemble-based positive-unlabeled learning. European Journal of Human Genetics, 2021, vol. 29, no. 10, pp. 1527–1535. https://doi.org/10.1038/s41431-021-00930-w
Lam M., Chen C-Y., Li Z., Martin A.R., Bryois J., Ma X., Gaspar H.,Ikeda M., Benyamin B., Brown B.C. et al. Comparative genetic architectures of schizophrenia in East Asian and European populations. Nature Genetics, 2019, vol. 51, no. 12, pp. 1670–1678. https://doi.org/10.1038/s41588-019-0512-x
Singh T., Poterba T., Curtis D., Akil H., Al Eissa M., Barchas J.D., Bass N., Bigdeli T.B., Breen G., Bromet E.J.et al.Rare coding variants in ten genes confer substantial risk for schizophrenia. Nature, 2022, vol. 604, no. 7906, pp. 509–516. https://doi.org/10.1038/s41586-022-04556-w
Usoltsev D., Kolosov N., Rotar O., Loboda A., Boyarinova M., Moguchaya E., Kolesova E., Erina A., Tolkunova K., Rezapova V., Molotkov I. et al. Complex trait susceptibilities and population diversity in a sample of 4,145 Russians. Nature Communications, 2024, vol. 15, no. 1, pp. 6212. https://doi.org/10.1038/s41467-024-50304-1
Usoltsev D., Njauw C.N., Ji Z., Kumar R., Sergushichev A., Zhang S., Shlyakhto E., Daly M.J., Artomov M., Tsao H. Analysis of variants induced by combined ex vivo irradiation and in vivo tumorigenesis suggests a role for the ZNF831 p. R1393Q variantin cutaneous melanoma development. Journal of Investigative Dermatology, 2024, in Press, corrected proof. https://doi.org/10.1016/j.jid.2024.08.042
Loboda A.A. A method of graphical clustering for joint analysis of genotyping and expression data. Dissertation for the degree of candidate of technical sciences. St. Petersburg, 2022, 232 p. (in Russian)
Subramanian A., Tamayo P., Mootha V.K., Mukherjee S., Ebert B.L., Gillette M.A., Paulovich A., Pomeroy S.L., Golub T.R., Lander E.S., Mesirov J.P. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc.of the National Academy of Sciencesof the United States of America, 2005, vol. 102, no. 43, pp. 15545–15550. https://doi.org/10.1073/pnas.0506580102

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License