DOI: 10.17586/2226-1494-2017-17-1-129-136


Y. B. Abdullin, V. V. Ivanov

Read the full article 
Article in English

For citation: Abdullin Y.B., Ivanov V.V. Deep learning model for bilingual sentiment classification of short texts. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2017, vol. 17, no. 1, pp. 129–136. doi: 10.17586/2226-1494-2017-17-1-129-136


Sentiment analysis of short texts such as Twitter messages and comments in news portals is challenging due to the lack of contextual information. We propose a deep neural network model that uses bilingual word embeddings to effectively solve sentiment classification problem for a given pair of languages. We apply our approach to two corpora of two different language pairs: English-Russian and Russian-Kazakh. We show how to train a classifier in one language and predict in another. Our approach achieves 73% accuracy for English and 74% accuracy for Russian. For Kazakh sentiment analysis, we propose a baseline method, that achieves 60% accuracy; and a method to learn bilingual embeddings from a large unlabeled corpus using a bilingual word pairs.

Keywords: sentiment analysis, bilingual word embeddings, recurrent neural networks, deep learning, Kazakh language

Acknowledgements. This work is supported by the Russian Science Foundation (project 15-11-10019 ”Text mining models and methods for analysis of the needs, preferences and consumer behaviour”. The authors thank the Everware team ( for access to their platform. The authors would like to thank Yerlan Seitkazinov, Zarina Sadykova and Aliya Sitdikova for manual annotation of the Kazakh sentiment corpus.

1.       Jansen B.J., Zhang M., Sobel K., Chowdury A. Twitter power: Tweets as electronic word of mouth. Journal of the American Society for Information Science and Technology, 2009. vol. 60, no. 11, pp. 2169–2188. doi: 10.1002/asi.21149
2.       Chew C., Eysenbach G. Pandemics in the age of twitter: content analysis of tweets during the 2009 H1N1 outbreak. PloS One, 2010, vol. 5, no. 11, art. e14118. doi: 10.1371/journal.pone.0014118
3.       Paul M.J., Dredze M. You are what you tweet: analyzing twitter for public health. ICWSM, 2011, vol. 20, pp. 265–272.
4.       Bengio Y., Schwenk H., Senecal J.-S., Morin F., Gauvain J.-L. Neural probabilistic language models. Innovations in Machine Learning, 2006, vol. 194, pp. 137–186. doi: 10.1007/3-540-33486-6_6
5.       Collobert R., Weston J. A unified architecture for natural language processing: deep neural networks with multitask learning. Proc. 25th Int. Conf. on Machine Learning, 2008, pp. 160–167. doi: 10.1145/1390156.1390177
6.       Mikolov T., Chen K., Corrado G., Dean J. Efficient estimation of word representations in vector space. Proceedings of Workshop at ICLR, 2013.
7.       Pennington J., Socher R., Manning C.D. Glove: global vectors for word representation. Proc. Conf. on Empirical Methods in Natural Language Processing EMNLP, 2014, vol. 14, pp. 1532–1543. doi: 10.3115/v1/d14-1162
8.       Zou W.Y., Socher R., Cer D.M., Manning C.D. Bilingual word embeddings for phrase-based machine translation. EMNLP, 2013, pp. 1393–1398.
9.       Manning C.D., Raghavan P., Schütze H. et al. Introduction to Information Retrieval. Cambridge University Press, 2008, vol. 1, no. 1.
10.    dos Santos C.N., Gatti M. Deep convolutional neural networks for sentiment analysis of short texts. Proc. 25th Int. Conf. on Computational Linguistics. Dublin, Ireland, 2014, pp. 69–78.
11.    Vulic I., Moens M.-F. Bilingual word embeddings from non- parallel document-aligned data applied to bilingual lexicon induction. Proc. 53rd Annual Meeting of the Association for Computational Linguistics AC, 2015. doi: 10.3115/v1/p15-2118
12.    Lu A., Wang W., Bansal M., Gimpel K., Livescu K. Deep multilingual correlation for improved word embeddings. Proc. Annual Conference of the North American Chapter of the ACL, NAACL. Denver, Colorado, 2015, pp. 250–256. doi: 10.3115/v1/n15-1028
13.    Mohammad S.M., Kiritchenko S., Zhu X. Nrc-canada: Building the state-of-the-art in sentiment analysis of tweets. arXiv preprint, 2013, arXiv:1308.6242.
14.    Mikolov T., Le Q.V., Sutskever I. Exploiting similarities among languages for machine translation. arXiv preprint, 2013. arXiv:1309.4168.
15.    Hochreiter S., Schmidhuber J. Long short-term memory. Neural Computation, 1997, vol. 9, no. 8, pp. 1735–1780. doi: 10.1162/neco.1997.9.8.1735
16.    Graves A. Supervised Sequence Labelling with Recurrent Neural Networks. Springer, 2012, 146 p.doi: 10.1007/978-3-642-24797-2
17.    Olah C. Understanding LSTM networks. 2015. Available at: (accessed: 30.11.16).
18.    Cho K., van Merriënboer B., Bahdanau D., Bengio Y. On the properties of neural machine translation: encoder-decoder approaches. Proc. Workshop on Syntax Semantics and Structure in Statistical Translation, 2014.
19.    Tieleman T., Hinton G. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning, 2012, vol. 4, p. 2.
20.    Dauphin Y.N., de Vries H., Chung J., Bengio Y. Rmsprop and equilibrated adaptive learning rates for non-convex optimization. arXiv preprint, 2015, arXiv:1502.04390.
21.    Srivastava N., Hinton G., Krizhevsky A., Sutskever I., Salakhutdinov R. Dropout: A simple way to prevent neural networks from overfittng. The Journal of Machine Learning Research, 2014, vol. 15, no. 1, pp. 1929–1958.
22.    Go A., Bhayani R., Huang L. Twitter sentiment classification using distant supervision. Technical ReportCS224N. Stanford, 2009, vol. 1, p. 12.
23.    Rubtsova Y.V., Zagorulko Y.A. An approach to construction and analysis of a corpus of short Russian texts intended to train a sentiment classifier. The Bulletin of NCC, 2014, vol. 37, pp. 107–116.
24.    Google. Tool for computing continuous distributed representations of words. Available at: (accessed: 30.11.16).
25.    Makhambetov O., Makazhanov A., Yessenbayev Z., Matkarimov B., Sabyrgaliyev I., Sharafudinov A. Assembling the kazakh language corpus. Proc. Conference on Empirical Methods in Natural Language Processing. Seattle, Washington, 2013, pp. 1022–1031.
26.    Van der Maaten L., Hinton G. Visualizing data using t-sne. Journal of Machine Learning Research, 2008, vol. 9, pp. 2579–2605.
27.    Chollet F. Keras: Theano-based deep learning library. 2015. Available at: (accessed: 30.11.16).
28.    Bergstra J., Breuleux O., Bastien F., Lamblin P., Pascanu R., Desjardins G., Turian J., Warde-Farley D., Bengio Y. Theano: a cpu and gpu math expression compiler. Proc. Python for Scientific Computing Conference. Austin, 2010, vol. 4, p. 3.
29.    Machine Learning in Python. Available: (accessed: 30.11.16).
Chollet F. Keras: Deep learning library for Theano and tensor ow. Available: (accessed: 30.11.16).

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License
Copyright 2001-2019 ©
Scientific and Technical Journal
of Information Technologies, Mechanics and Optics.
All rights reserved.