Nikiforov
Vladimir O.
D.Sc., Prof.
doi: 10.17586/2226-1494-2018-18-6-1084-1090
REINFORCED SEQ2SEQ ADVERSARIAL AUTOENCODER FOR DE NOVO MOLECULAR DESIGN
Read the full article ';
For citation:
Putin E.O. Reinforced seq2seq adversarial autoencoder for de novo molecular design. Scientific and Technical Journal of Information Technologies, Mechanics and Optics , 2018, vol. 18, no. 6, pp. 1084–1090 (in Russian). doi: 10.17586/2226-1494-2018-18-6-1084-1090
Abstract
Subject of Research.The modern models of deep training for generation of target small organic molecules are studied. The studies were carried out on two datasets of 250,000 drug-like molecular compounds from the ZINC database and 23,000 kinase molecular structures collected manually from the open accessed ChemBL database. Method.We propose the model of a deep neural network based on the concepts of adversarial learning and reinforcement learning. The model controls the molecular validity of the generated structures through the use of a recurrent seq2seq autoencoder and an external generator. The presence of an external generator gives the model flexibility in the choice of architecture, and also allows for the input conditions for the generation. Main Results. Comparative experiments have shown that the proposed model is better than its closest competitors in experiments with pre- and post-training in terms of generating valid and unique molecular structures. Additional chemical analysis of generated structures demonstrates the best quality of the introduced model in comparison with the other competitor models. Practical Relevance.The proposed model can be used by medical chemists as an intelligent assistant for development of new drugs.
Acknowledgements. This work was financially supported by the Government of the Russian Federation, Grant 074-U01, and the Russian Foundation for Basic Research, Grant 16-37-60115 mol_a_dk.
References
-
Holenz J. (eds) Lead Generation: Methods and Strategies. John Wiley & Sons, 2016, vol. 2.
-
DiMasi J.A., Grabowski H.G., Hansen R.W. Innovation in the pharmaceutical industry: new estimates of R&D costs. Journal of Health Economics, 2016, vol. 47, pp. 20–33. doi: 10.1016/j.jhealeco.2016.01.012
-
Ivanenkov Y.A. et al. Small-molecule inhibitors of hepatitis C virus (HCV) non-structural protein 5A (NS5A):
a patent review (2010-2015). Expert Opinion on Therapeutic Patents, 2017, vol. 27, no. 4, pp. 401–414. doi: 10.1080/13543776.2017.1272573 -
Schneider G., Fechner U. Computer-based de novo design of drug-like molecules. Nature Reviews Drug Discovery, 2005, vol. 4, no. 8, pp. 649–663. doi: 10.1038/nrd1799
-
LeCun Y., Bengio Y., Hinton G. Deep learning. Nature, 2015, vol. 521, no. 7553, pp. 436–444. doi: 10.1038/nature14539
-
Mamoshina P., Vieira A., Putin E., Zhavoronkov A. Applications of deep learning in biomedicine. Molecular
Pharmaceutics, 2016, vol. 13, no. 5, pp. 1445–1454. doi: 10.1021/acs.molpharmaceut.5b00982 -
Min S., Lee B., Yoon S. Deep learning in bioinformatics. Briefingsin Bioinformatics, 2017, vol. 18, no. 5, pp. 851–869.
-
Pastur-Romay L., Cedron F. et al. Deep artificial neural networks and neuromorphic chips for big data analysis:
pharmaceutical and bioinformatics applications. International Journal of Molecular Sciences, 2016, vol. 17, no. 8, p. 1313. doi: 10.3390/ijms17081313 -
Zhang L., Tan J., Han D., Zhu H. From machine learning to deep learning: progress in machine intelligence for rational drug discovery.Drug Discovery Today, 2017, vol. 22, no. 11, pp. 1680–1685.doi: 10.1016/j.drudis.2017.08.010
-
Gawehn E., Hiss J.A., Schneider G. Deep learning in drug discovery. Molecular Informatics, 2016, vol. 35, no. 1, pp. 3–14.
-
Gupta A., Muller A.T., Huisman B.J.H. et al. Generative recurrent networks for de novo drug design. Molecular Informatics, 2018, vol. 37, no. 1-2. doi: 10.1002/minf.201880141
-
Yuan W. et al. Chemical space mimicry for drug discovery. Journal of Chemical Information and Modeling, 2017, vol. 57, no. 4, pp. 875–882. doi: 10.1021/acs.jcim.6b00754
-
Korotcov A., Tkachenko V., Russo D.P., Ekins S. Comparison of deep learning with multiple machine learning methods and metrics using diverse drug discovery data sets. Molecular
Pharmaceutics, 2017, vol. 14, no. 12, pp. 4462–4475. doi: 10.1021/acs.molpharmaceut.7b00578 -
Olivecrona M., Blaschke T., Engkvist O., Chen H. Molecular de-novo design through deep reinforcement learning.
Journal of Cheminformatics, 2017, vol. 9, no. 1, p. 48. doi: 10.1186/s13321-017-0235-x -
Sanchez-Lengeling B., Outeiral C., Guimaraes G.L., Aspuru-Guzik A. Optimizing distributions over molecular space. An objective-reinforced generative adversarial network for
inverse-design chemistry (ORGANIC). ChemRxiv. Preprint, 2017. doi: 10.26434/chemrxiv.5309668.v3 -
Putin E., Asadulaev A., Ivanenkov Y., Aladinskiy V. et al. Reinforced adversarial neural computer for de novo molecular design. Journal of Chemical Information and Modeling, 2018, vol. 58, no. 6, pp. 1194–1204. doi: 10.1021/acs.jcim.7b00690
-
Putin E., Asadulaev A., Vanhaelen Q., Ivanenkov Y. et al. Adversarial threshold neural computer for molecular de novo
design. Molecular Pharmaceutics, 2018, vol. 15, no. 10, pp. 4386–4397. doi: 10.1021/acs.molpharmaceut.7b01137 -
Sutskever I., Vinyals O., Le Q.V. Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems, 2014.
-
Goodfellow I., Pouget-Abadie J., Mirza M. et al. Generative adversarial nets. Advances in Neural Information Processing Systems, 2014, pp. 2672–2680.
-
Weininger D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. Journal of Chemical Information and Computer Sciences, 1988, vol. 28, no. 1, pp. 31–36. doi: 10.1021/ci00057a005
-
Williams R.J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 1992, vol. 8, no. 3-4, pp. 229–256. doi: 10.1007/bf00992696
-
Makhzani A., Shlens J., Jaitly N. et al. Adversarial autoencoders. arXiv preprint, 2015, arXiv:1511.05644
-
Gaulton A., Bellis L.J., Bento A.P. et al. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic
Acids Research, 2011, vol. 40, no. D1, pp. D1100-D1107. doi: 10.1093/nar/gkr777 -
Irwin J.J., Shoichet B.K. ZINC − A free database of commerciallyavailable compounds for virtual screening. Journal of Chemical Information and Modeling, 2005, vol. 45, no. 1, pp. 177–182.doi:10.1021/ci049714+