Menu
Publications
2025
2024
2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
Editor-in-Chief

Nikiforov
Vladimir O.
D.Sc., Prof.
Partners
doi: 10.17586/2226-1494-2025-25-3-487-497
Method for identifying the active module in biological graphs with multi-component vertex weights
Read the full article

Article in Russian
For citation:
Abstract
For citation:
Usoltsev D.A., Molotkov I.I., Artomov M.N., Sergushichev A.A., Shalyto A.A. Method for identifying the active module in biological graphs with multi-component vertex weights. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2025, vol. 25, no. 3, pp. 487–497 (in Russian). doi: 10.17586/2226-1494-2025-25-3-487-497
Abstract
An active module in biological graphs is a connected subgraph whose vertices share a common biological function. To identify an active module, one must first construct a weighted biological graph. The weight of each vertex is calculated based on biological experiments investigating the target biological function. However, the results of a single experiment may not fully describe the desired active module, covering only part of it and potentially introducing uncertainty into the vertex weights. This work demonstrates that employing Fisher’s method to integrate data from multiple experiments followed by applying a Markov chain Monte Carlo (MCMC) and machine learning–based approach to the results of Fisher’s method, enables more effective identification of active modules in biological graphs. The study utilizes the InWebIM protein–protein interaction graph, a human brain reconstruction graph from the BigBrain project, and a gene graph for the organism Caenorhabditis elegans. To combine the results of several experiments into a single outcome within one graph, Fisher’s method is applied. Afterwards, the search for active modules is conducted using an MCMC and machine learning-based method. To validate the proposed method on real data, results from Genome- Wide Association Studies on schizophrenia and smoking are used, along with the gene expression matrix of patients with skin melanoma from the TCGA project. Applying Fisher’s method makes it possible to consider the results of multiple biological experiments simultaneously. Subsequent use of the MCMC and machine learning–based method improves the accuracy of identifying active modules compared to ranking graph vertices solely by Fisher’s method. Considering the results of multiple biological experiments when determining active modules plays a crucial role in increasing the accuracy of identifying the vertices of the active module. This, in turn, promotes a deeper understanding of the biological mechanisms of diseases, which can be of great significance for the development of new diagnostic and therapeutic methods.
Keywords: graphs, MCMC, Fisher’s method, biological graphs, active module
References
References
- Wang S., Wu R., Lu J., Jiang Y., Huang T., Cai Y.D. Protein-protein interaction networks as miners of biological discovery. Proteomics, 2022, vol. 22, no. 15-16, P. e2100190. https://doi.org/10.1002/pmic.202100190
- Rao X., Dixon R.A. Co-expression networks for plant biology: why and how. Acta Biochimica et Biophysica Sinica, 2019, vol. 51,no. 10, pp. 981–988. https://doi.org/10.1093/abbs/gmz080
- Rawls K., Dougherty B.V., Papin J. Metabolic network reconstructions to predict drug targets and off-target effects. Methods in Molecular Biology, 2020, vol. 2088, pp. 315–330. https://doi.org/10.1007/978-1-0716-0159-4_14
- Dittrich M.T., Klau G.W., Rosenwald A., Dandekar T., Müller T. Identifying functional modules in protein-protein interaction networks: an integrated exact approach. Bioinformatics, 2008, vol. 24, no. 13. pp. i223–i231. https://doi.org/10.1093/bioinformatics/btn161
- Zhu Q.M., Hsu Y.H.H., Lassen F.H., MacDonald B.T., Stead S., Malolepsza E., Kim A., Li T., Mizoguchi T., Schenone M., Guzman G., Tanenbaum B., Fornelos N., Carr S.A., Gupta R.M., Ellinor P.T., Lage K. Protein interaction networks in the vasculature prioritize genes and pathways underlying coronary artery disease. Communications Biology, 2024, vol. 7, no. 1, pp. 87. https://doi.org/10.1038/s42003-023-05705-1
- Nehme R., Pietiläinen O., Artomov M., Tegtmeyer M., Valakh V., Lehtonen L., Bell C., Singh T., Trehan A., Sherwood J. et. al. The 22q11.2 region regulates presynaptic gene-products linked to schizophrenia. Nature Communications, 2022, vol. 13, no. 1, pp. 3690. https://doi.org/10.1038/s41467-022-31436-8
- Alexeev N., Isomurodov J., Sukhov V., Korotkevich G., Sergushichev A. Markov chain Monte Carlo for active module identification problem. BMC Bioinformatics, 2020, vol. 21, Suppl. 6, pp. 261. https://doi.org/10.1186/s12859-020-03572-9
- Usoltsev D.A., Molotkov I.I., Artomov M.N., Sergushichev A.A., Shalyto A.A. Application of Markov chain Monte Carlo and machine learning for identifying active modules in biological graphs. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2024, vol. 24, no. 6, pp. 962–971. (in Russian). https://doi.org/10.17586/2226-1494-2024-24-6-962-971
- Kim T.K., Park J.H. More about the basic assumptions of t-test: normality and sample size. Korean Journal of Anesthesiology, 2019, vol. 72, no. 4, pp. 331–335. https://doi.org/10.4097/kja.d.18.00292
- Pounds S., Morris S.W. Estimating the occurrence of false positives and false negatives in microarray studies by approximating and partitioning the empirical distribution of p-values. Bioinformatics, 2003, vol. 19, no. 10, pp. 1236–1242. https://doi.org/10.1093/bioinformatics/btg148
- Ham H., Park T. Combiningp-values from various statistical methods for microbiome data. Frontiers in Microbiology, 2022, vol. 13, pp. 990870. https://doi.org/10.3389/fmicb.2022.990870
- Li T., Wernersson R., Hansen R.B., Horn H., Mercer J., Slodkowicz G., Workman C.T., Rigina O., Rapacki K., Stærfeldt H.H., Brunak S., Jensen T.S., Lage K. A scored human protein-protein interaction network to catalyze genomic interpretation. Nature Methods, 2017, vol. 14, no. 1, pp. 61–64. https://doi.org/10.1038/nmeth.4083
- Rossi R., Ahmed N. The network data repository with interactive graph analytics and visualization. Proc.of the 29th AAAI Conference on Artificial Intelligence, 2015, vol. 29, no. 1. https://doi.org/10.1609/aaai.v29i1.9277
- Amunts K., Lepage C., Borgeat L., Mohlberg H., Dickscheid T., Rousseau M.É., Bludau S., Bazin P.L., Lewis L.B., Oros-Peusquens A.M., Shah N.J., Lippert T., Zilles K., Evans A.C. BigBrain: an ultrahigh-resolution 3D human brain model. Science, 2013, vol. 340, no. 6139, pp. 1472–1475. https://doi.org/10.1126/science.1235381
- Cho A., Shin J., Hwang S., Kim C., Shim H., Kim H., Kim H., Lee I. WormNet v3: a network-assisted hypothesis-generating server for Caenorhabditis elegans. Nucleic Acids Research, 2014, vol. 42, no. W1, pp. W76–W82. https://doi.org/10.1093/nar/gku367
- Zhu Z., Zhang F., Hu H., Bakshi A., Robinson M.R., Powell J.E., Montgomery G.W., Goddard M.E., Wray N.R., Visscher P.M., Yang J. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nature Genetics, 2016, vol. 48, no. 5, pp. 481–487. https://doi.org/10.1038/ng.3538
- Usoltsev D., Molotkov I., Artomov M. A meta-predictor for causal gene identification in GWAS overcomes limitations of existing computational approaches. American Society of Human Genetics (Complex Traits and Polygenic Disorders Poster Friday Session), 2024.
- Pardiñas A.F., Holmans P., Pocklington A.J., Escott-Price V., Ripke S., Carrera N., Legge S.E., Bishop S., Cameron D., Hamshere M.L., et. al. Common schizophrenia alleles are enriched in mutation-intolerant genes and in regions under strong background selection. Nature Genetics, 2018, vol. 50, no. 3, pp. 381–389. https://doi.org/10.1038/s41588-018-0059-2
- Barbeira A.N., Dickinson S.P., Bonazzola R., Zheng J., Wheeler H.E., Torres J.M., Torstenson E.S., Shah K.P., Garcia T., Edwards T.L., Stahl E.A., Huckins L.M., Nicolae D.L., Cox N.J., Im H.K. Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics.Nature Communications, 2018, vol. 9,no. 1, pp. 1825. https://doi.org/10.1038/s41467-018-03621-1
- Urbut S.M., Wang G., Carbonetto P., Stephens M. Flexible statistical methods for estimating and testing effects in genomic studies with multiple conditions. Nature Genetics, 2019, vol. 51, no. 1, pp. 187–195. https://doi.org/10.1038/s41588-018-0268-8
- Kolosov N., Daly M.J., Artomov M. Prioritization of disease genes from GWAS using ensemble-based positive-unlabeled learning. European Journal of Human Genetics, 2021, vol. 29, no. 10, pp. 1527–1535. https://doi.org/10.1038/s41431-021-00930-w
- Lam M., Chen C-Y., Li Z., Martin A.R., Bryois J., Ma X., Gaspar H.,Ikeda M., Benyamin B., Brown B.C. et al. Comparative genetic architectures of schizophrenia in East Asian and European populations. Nature Genetics, 2019, vol. 51, no. 12, pp. 1670–1678. https://doi.org/10.1038/s41588-019-0512-x
- Singh T., Poterba T., Curtis D., Akil H., Al Eissa M., Barchas J.D., Bass N., Bigdeli T.B., Breen G., Bromet E.J.et al.Rare coding variants in ten genes confer substantial risk for schizophrenia. Nature, 2022, vol. 604, no. 7906, pp. 509–516. https://doi.org/10.1038/s41586-022-04556-w
- Usoltsev D., Kolosov N., Rotar O., Loboda A., Boyarinova M., Moguchaya E., Kolesova E., Erina A., Tolkunova K., Rezapova V., Molotkov I. et al. Complex trait susceptibilities and population diversity in a sample of 4,145 Russians. Nature Communications, 2024, vol. 15, no. 1, pp. 6212. https://doi.org/10.1038/s41467-024-50304-1
- Usoltsev D., Njauw C.N., Ji Z., Kumar R., Sergushichev A., Zhang S., Shlyakhto E., Daly M.J., Artomov M., Tsao H. Analysis of variants induced by combined ex vivo irradiation and in vivo tumorigenesis suggests a role for the ZNF831 p. R1393Q variantin cutaneous melanoma development. Journal of Investigative Dermatology, 2024, in Press, corrected proof. https://doi.org/10.1016/j.jid.2024.08.042
- Loboda A.A. A method of graphical clustering for joint analysis of genotyping and expression data. Dissertation for the degree of candidate of technical sciences. St. Petersburg, 2022, 232 p. (in Russian)
- Subramanian A., Tamayo P., Mootha V.K., Mukherjee S., Ebert B.L., Gillette M.A., Paulovich A., Pomeroy S.L., Golub T.R., Lander E.S., Mesirov J.P. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc.of the National Academy of Sciencesof the United States of America, 2005, vol. 102, no. 43, pp. 15545–15550. https://doi.org/10.1073/pnas.0506580102