REINFORCED SEQ2SEQ ADVERSARIAL AUTOENCODER FOR DE NOVO MOLECULAR DESIGN

Evgeniy O. Putin

doi:10.17586/2226-1494-2018-18-6-1084-1090

2018 , VOLUME 18, NUMBER 8 ( november- december 2018 )

ISSN 2226-1494 (print), ISSN 2500-0373 (online)

Publications

Editor-in-Chief

Nikiforov
Vladimir O.
D.Sc., Prof.

Partners

doi: 10.17586/2226-1494-2018-18-6-1084-1090

REINFORCED SEQ2SEQ ADVERSARIAL AUTOENCODER FOR DE NOVO MOLECULAR DESIGN

E. O. Putin

Read the full article

Article in Russian

For citation:

Putin E.O. Reinforced seq2seq adversarial autoencoder for de novo molecular design. Scientific and Technical Journal of Information Technologies, Mechanics and Optics , 2018, vol. 18, no. 6, pp. 1084–1090 (in Russian). doi: 10.17586/2226-1494-2018-18-6-1084-1090

Abstract

Subject of Research.The modern models of deep training for generation of target small organic molecules are studied. The studies were carried out on two datasets of 250,000 drug-like molecular compounds from the ZINC database and 23,000 kinase molecular structures collected manually from the open accessed ChemBL database. Method.We propose the model of a deep neural network based on the concepts of adversarial learning and reinforcement learning. The model controls the molecular validity of the generated structures through the use of a recurrent seq2seq autoencoder and an external generator. The presence of an external generator gives the model flexibility in the choice of architecture, and also allows for the input conditions for the generation. Main Results. Comparative experiments have shown that the proposed model is better than its closest competitors in experiments with pre- and post-training in terms of generating valid and unique molecular structures. Additional chemical analysis of generated structures demonstrates the best quality of the introduced model in comparison with the other competitor models. Practical Relevance.The proposed model can be used by medical chemists as an intelligent assistant for development of new drugs.

Keywords: Subject of Research. The modern models of deep training for generation of target small organic molecules are studied. The studies were carried out on two datasets of 250,000 drug-like molecular compounds from the ZINC database and 23,000 kinase molecular structures collected manually from the open accessed ChemBL database. Method. We propose the model of a deep neural network based on the concepts of adversarial learning and reinforcement learning. The model controls the molecular validity of the generated structures through the use of a recurrent seq2seq autoencoder and an external generator. The presence of an external generator gives the model flexibility in the choice of architecture, and also allows for the input conditions for the generation. Main Results. Comparative experiments have shown that the proposed model is better than its closest competitors in experiments with pre- and post-training in terms of generating valid and unique molecular structures. Additional chemical analysis of generated structures demonstrates the best quality of the introduced model in comparison with the other competitor models. Practical Relevance. The proposed model can be used by medical chemists as an intelligent assistant for development of new drugs.

Acknowledgements. This work was financially supported by the Government of the Russian Federation, Grant 074-U01, and the Russian Foundation for Basic Research, Grant 16-37-60115 mol_a_dk.

References

Holenz J. (eds) Lead Generation: Methods and Strategies. John Wiley & Sons, 2016, vol. 2.
DiMasi J.A., Grabowski H.G., Hansen R.W. Innovation in the pharmaceutical industry: new estimates of R&D costs. Journal of Health Economics, 2016, vol. 47, pp. 20–33. doi: 10.1016/j.jhealeco.2016.01.012
Ivanenkov Y.A. et al. Small-molecule inhibitors of hepatitis C virus (HCV) non-structural protein 5A (NS5A):
a patent review (2010-2015). Expert Opinion on Therapeutic Patents, 2017, vol. 27, no. 4, pp. 401–414. doi: 10.1080/13543776.2017.1272573
Schneider G., Fechner U. Computer-based de novo design of drug-like molecules. Nature Reviews Drug Discovery, 2005, vol. 4, no. 8, pp. 649–663. doi: 10.1038/nrd1799
LeCun Y., Bengio Y., Hinton G. Deep learning. Nature, 2015, vol. 521, no. 7553, pp. 436–444. doi: 10.1038/nature14539
Mamoshina P., Vieira A., Putin E., Zhavoronkov A. Applications of deep learning in biomedicine. Molecular
Pharmaceutics, 2016, vol. 13, no. 5, pp. 1445–1454. doi: 10.1021/acs.molpharmaceut.5b00982
Min S., Lee B., Yoon S. Deep learning in bioinformatics. Briefingsin Bioinformatics, 2017, vol. 18, no. 5, pp. 851–869.
Pastur-Romay L., Cedron F. et al. Deep artificial neural networks and neuromorphic chips for big data analysis:
pharmaceutical and bioinformatics applications. International Journal of Molecular Sciences, 2016, vol. 17, no. 8, p. 1313. doi: 10.3390/ijms17081313
Zhang L., Tan J., Han D., Zhu H. From machine learning to deep learning: progress in machine intelligence for rational drug discovery.Drug Discovery Today, 2017, vol. 22, no. 11, pp. 1680–1685.doi: 10.1016/j.drudis.2017.08.010
Gawehn E., Hiss J.A., Schneider G. Deep learning in drug discovery. Molecular Informatics, 2016, vol. 35, no. 1, pp. 3–14.
Gupta A., Muller A.T., Huisman B.J.H. et al. Generative recurrent networks for de novo drug design. Molecular Informatics, 2018, vol. 37, no. 1-2. doi: 10.1002/minf.201880141
Yuan W. et al. Chemical space mimicry for drug discovery. Journal of Chemical Information and Modeling, 2017, vol. 57, no. 4, pp. 875–882. doi: 10.1021/acs.jcim.6b00754
Korotcov A., Tkachenko V., Russo D.P., Ekins S. Comparison of deep learning with multiple machine learning methods and metrics using diverse drug discovery data sets. Molecular
Pharmaceutics, 2017, vol. 14, no. 12, pp. 4462–4475. doi: 10.1021/acs.molpharmaceut.7b00578
Olivecrona M., Blaschke T., Engkvist O., Chen H. Molecular de-novo design through deep reinforcement learning.
Journal of Cheminformatics, 2017, vol. 9, no. 1, p. 48. doi: 10.1186/s13321-017-0235-x
Sanchez-Lengeling B., Outeiral C., Guimaraes G.L., Aspuru-Guzik A. Optimizing distributions over molecular space. An objective-reinforced generative adversarial network for
inverse-design chemistry (ORGANIC). ChemRxiv. Preprint, 2017. doi: 10.26434/chemrxiv.5309668.v3
Putin E., Asadulaev A., Ivanenkov Y., Aladinskiy V. et al. Reinforced adversarial neural computer for de novo molecular design. Journal of Chemical Information and Modeling, 2018, vol. 58, no. 6, pp. 1194–1204. doi: 10.1021/acs.jcim.7b00690
Putin E., Asadulaev A., Vanhaelen Q., Ivanenkov Y. et al. Adversarial threshold neural computer for molecular de novo
design. Molecular Pharmaceutics, 2018, vol. 15, no. 10, pp. 4386–4397. doi: 10.1021/acs.molpharmaceut.7b01137
Sutskever I., Vinyals O., Le Q.V. Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems, 2014.
Goodfellow I., Pouget-Abadie J., Mirza M. et al. Generative adversarial nets. Advances in Neural Information Processing Systems, 2014, pp. 2672–2680.
Weininger D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. Journal of Chemical Information and Computer Sciences, 1988, vol. 28, no. 1, pp. 31–36. doi: 10.1021/ci00057a005
Williams R.J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 1992, vol. 8, no. 3-4, pp. 229–256. doi: 10.1007/bf00992696
Makhzani A., Shlens J., Jaitly N. et al. Adversarial autoencoders. arXiv preprint, 2015, arXiv:1511.05644
Gaulton A., Bellis L.J., Bento A.P. et al. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic
Acids Research, 2011, vol. 40, no. D1, pp. D1100-D1107. doi: 10.1093/nar/gkr777
Irwin J.J., Shoichet B.K. ZINC − A free database of commerciallyavailable compounds for virtual screening. Journal of Chemical Information and Modeling, 2005, vol. 45, no. 1, pp. 177–182.doi:10.1021/ci049714+

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License