doi: 10.17586/2226-1494-2024-24-4-594-601


Predicting gene-disease associations using a heterogeneous graph neural network

D. A. Sidorenko, A. A. Shalyto


Read the full article  ';
Article in Russian

For citation:
Sidorenko D.A., Shalyto A.A. Predicting gene-disease associations using a heterogeneous graph neural network. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2024, vol. 24, no. 4, pp. 594–601 (in Russian). doi: 10.17586/2226-1494-2024-24-4-594-601


Abstract
The research presents the development of a heterogeneous graph neural network model for predicting gene-disease using existing genomic and medical data. The novelty of the approach is in integrating the principles of graph neural networks and heterogeneous information networks for efficient processing of structured data and consideration of complex genepathology interactions. The solution proposed is a heterogeneous graph neural network which utilizes a heterogeneous graph structure for representing genes, diseases, and their relationships. The performance of the developed model was evaluated on the DisGeNET, LASTFM, YELP datasets. On these datasets, a comparison was made with current SOTA models. The comparison results demonstrated that the proposed model outperforms other models in terms of Average Precision (AP), F1-measure (F1@S), Hit@k, Area Under Receiver Operating Characteristic curve (AUROC) in predicting “gene-disease” associations. The model developed serves as a tool for bioinformatics analysis and can aid researchers and doctors in studying genetic diseases. This could expedite the discovery of new drug targets and the advancement of personalized medicine.

Keywords: machine learning, graph neural networks, heterogeneous information networks, bioinformatics, genetics, “gene-disease” prediction associations

References
  1. Henaff M., Bruna J., LeCun Y. Deep convolutional networks on graph-structured data. arXiv, 2015, arXiv:1506.05163. https://doi.org/10.48550/arXiv.1506.05163
  2. Wang X., Bo D., Shi C., Fan S., Ye Y., Yu P.S. A survey on heterogeneous graph embedding: methods, techniques, applications and sources. IEEE Transactions on Big Data, 2023, vol. 9, no. 2, pp. 415–436. https://doi.org/10.1109/TBDATA.2022.3177455
  3. Shao B., Li X., Bian G. A survey of research hotspots and frontier trends of recommendation systems from the perspective of knowledge graph. Expert Systems with Applications, 2021, vol. 165, pp. 113764. https://doi.org/10.1016/j.eswa.2020.113764
  4. László L. Random walks on graphs: a survey. Combinatorics. V. 2. 1993, pp. 1–46.
  5. Li L., Wang Y., An L., Kong X., Huang T. A network-based method using a random walk with restart algorithm and screening tests to identify novel genes associated with Menière’s disease. PLOS ONE, 2017, vol. 12, no. 8, pp. e0182592. https://doi.org/10.1371/journal.pone.0182592
  6. Muslu Ö., Hoyt C.T., Lacerda M., Hofmann-Apitius M., Frohlich H. GuiltyTargets: Prioritization of novel therapeutic targets with network representation learning. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2022, vol. 19, no. 1, pp. 491–500. https://doi.org/10.1109/TCBB.2020.3003830
  7. Li Y., Kuwahara H., Yang P., Song L., Gao X. PGCN: Disease gene prioritization by disease and gene embedding through graph convolutional neural networks. biorxiv.org, 2019. https://doi.org/10.1101/532226
  8. Dutta A., Alcaraz J., TehraniJamsaz A., Cesar E., Sikora A., Jannesari A. Performance optimization using multimodal modeling and heterogeneous GNN. arXiv, 2023, arXiv.2304.12568. https://doi.org/10.48550/arXiv.2304.12568
  9. Thanapalasingam T., van Berkel L., Bloem P., Groth P. Relational graph convolutional networks: Closer Look. PeerJ Computer Science, 2022, vol. 8, pp. e1073. https://doi.org/10.7717/PEERJ-CS.1073
  10. Wang X., Ji H., Shi C., Wang B., Ye Y., Cui P., Yu P.S. Heterogeneous graph attention network. Proc. of the WWW '19: The World Wide Web Conference, 2019, pp. 2022–2032. https://doi.org/10.1145/3308558.3313562
  11. Ali A., Bagchi A. An overview of protein-protein interaction. Current Chemical Biology, 2015, vol. 9, no. 1, pp. 53–65. https://doi.org/10.2174/221279680901151109161126
  12. Malone J., Holloway E., Adamusiak T., Kapushesky M., Zheng J., Kolesnikov N., Zhukova A., Brazma A., Parkinson H. Modeling sample variables with an experimental factor ontology. Bioinformatics, 2010, vol. 26, no. 8, pp. 1112–1118. https://doi.org/10.1093/bioinformatics/btq099
  13. Lee J., Yoon W., Kim S., Kim D., Kim S., So C.H., Kang J. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 2020, vol. 36, no. 4, pp. 1234–1240. https://doi.org/10.1093/bioinformatics/btz682
  14. Zhang X., Zou Y., Shi W. Dilated convolution neural network with LeakyReLU for environmental sound classification. Proc. of the 22nd International Conference on Digital Signal Processing (DSP), 2017. https://doi.org/10.1109/ICDSP.2017.8096153
  15. Piñero J., Queralt-Rosinach N., Bravo A., Deu-Pons J., Bauer-Mehren A., Baron M., Sanz F., Furlong L.I. DisGeNET: A discovery platform for the dynamical exploration of human diseases and their genes. Database, 2015, vol. 2015. https://doi.org/10.1093/database/bav028
  16. Alam M., Cevallos B., Flores O., Lunetto R., Yayoshi K., Woo J. Yelp Dataset Analysis using Scalable Big Data. arXiv, 2021, arXiv.2104.08396v1. https://doi.org/10.48550/arXiv.2104.08396
  17. Li Y., Guo X., Lin W., Zhong M., Li Q., Liu Z., Zhong W., Zhu Z. Learning dynamic user interest sequence in knowledge graphs for click-through rate prediction. IEEE Transactions on Knowledge and Data Engineering, 2023, vol. 35, no. 1, pp. 647–657. https://doi.org/10.1109/TKDE.2021.3073717
  18. Kuo Y., Wang R., Liu G., Shu Z., Wang N., Zhang R., Yu J., Chen J., Li X., Zhou X. HerGePred: Heterogeneous network embedding representation for disease gene prediction. IEEE Journal of Biomedical and Health Informatics, 2019, vol. 23, no. 4, pp. 1805–1815. https://doi.org/10.1109/JBHI.2018.2870728
  19. Grover A., Leskovec J. node2vec: Scalable feature learning for networks. Proc. of the KDD’16 . International Conference on Knowledge Discovery & Data Mining, 2016, pp. 855–864. https://doi.org/10.1145/2939672.2939754
  20. Yuxiao D., Chawla N., Swami A. metapath2vec: Scalable representation learning for heterogeneous networks. Proc. of the KDD’17 . International Conference on Knowledge Discovery & Data Mining, 2017, pp 135–144. https://doi.org/10.1145/3097983.3098036
  21. Mikolov T., Chen K., Corrado G., Dean J. Efficient Estimation of Word Representations in Vector Space. Proc. of the Workshop ICLR, 2013.
  22. Perozzi B., Al-Rfou R., Skiena S. DeepWalk: Online learning of social representations. Proc. of the KDD’14. 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2014, pp. 701–710. https://doi.org/10.1145/2623330.2623732
  23. Hu Z., Dong Y., Wang K., Sun Y. Heterogeneous graph transformer. Proc. of the WWW ’20. The Web Conference, 2020, pp. 2704–2710. https://doi.org/10.1145/3366423.3380027
  24. He M., Huang C., Liu B., Wang Y., Li J. Factor graph-aggregated heterogeneous network embedding for disease-gene association prediction. BMC Bioinformatics, 2021, vol. 22, pp. 165. https://doi.org/10.1186/s12859-021-04099-3


Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License
Copyright 2001-2024 ©
Scientific and Technical Journal
of Information Technologies, Mechanics and Optics.
All rights reserved.

Яндекс.Метрика