DOI: 10.17586/2226-1494-2019-19-4-714-721


V. N. Shmatkov, P. Bąkowski, D. S. Medvedev, S. V. Korzukhin, D. V. Golendukhin, S. F. Spynu, D. I. Mouromtsev

Read the full article 
Article in Russian

For citation:
Shmatkov V.N., Bąkowski P., Medvedev D.S., Korzukhin S.V., Golendukhin D.V., Spynu S.F., Mouromtsev D.I. Interaction with Internet of Things devices by voice control. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2019, vol. 19, no. 4, pp. 714–721 (in Russian).
doi: 10.17586/2226-1494-2019-19-4-714-721


Subject of Research. The paper considers actual popular voice assistants for voice control of Internet of Things devices, such as Google Cloud Speech-to-Text, Amazon Transcribe, IBM Speech-to-Text, Yandex SpeechKit. Their pros and cons are identified. Internet connection is necessary for operation of voice assistants aimed at data processing in the cloud and synchronization and control of user’s mobile devices. Voice assistants, which can operate without Internet connection, can have significant practical value. Method. Architectural model for on-site speech recognition (without the Internet) with the use of mobile devices is proposed. CMU Sphinx software is used as a base for spontaneous speech recognition system. The software uses both acoustic and speech models for spontaneous speech recognition and also translates voice commands into such ones that can be processed by the system based on OpenHab open platform for device control. The approaches to grammar and dictionary creation for speech recognition are proposed. Example of dictionary and grammar description for voice control of attached devices are given. In order to test the described approach, the demonstration stand was created based on single-board Raspberry Pi computer with OpenHab software installed. In addition, devices of the Internet of Things based on ESP8266 microcontroller were built. Main Results. Control of the Internet of Things devices and interaction with the server are implemented with the use of MQTT protocol. Testing of voice commands recognition is carried out. The possibility of practical application of the proposed approach to spontaneous speech recognition is shown. Practical Relevance. Proposed model describes and integrates into control system a significant part of the Internet of Things devices represented on market. By applying the model, it is possible to minimize or even remove the impact of external third-party services on working capacity of voice control system for the Internet of Things devices.

Keywords: human-computer interaction, IoT, Internet of Things, voice control, smart home, device control

  1. Gershenfeld N.A. When Things Start to Think. New York, Henry Holt and Company, 2000, 224 p.
  2. Belenko M.V., Balakshin P.V. Comparative analysis of speech recognition systems with open code. International Research Journal, 2017, no. 4-4, pp. 13–18. (in Russian) doi: 10.23670/IRJ.2017.58.141
  3. Dernoncourt F., Bui T., Chang W. A framework for speech recognition benchmarking. Proc. Interspeech. Hyderabad, India, 2018, pp. 169–170.
  4. Povey D., Ghoshal A., Boulianne G. The Kaldi speech recognition toolkit. Proc. Workshop on Automatic Speech Recognition and Understanding, 2011.
  5. Lamere P. et al. The CMU SPHINX-4 speech recognition system. IEEE Int. Conf. on Acoustics, Speech and Signal Processing. Hong Kong, 2003, vol. 1, pp. 2–5.
  6. Balandin S., Andreev S., Koucheryavy Y. (Eds.) Internet of Things, Smart Spaces, and Next Generation Networks and Systems. Springer, 2015, 713 p. doi: 10.1007/978-3-319-10353-2
  7. Pratap V., Hannun A., Xu Q. et al. Wav2Letter++: A fastest open-source speech recognition system. Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing. Brighton, UK, 2019. doi: 10.1109/icassp.2019.8683535
  8. Mehrabani M., Bangalore S., Stern B. Personalized speech recognition for Internet of Things. Proc. IEEE 2nd World Forum on Internet of Things. Milan, Italy, 2015, pp. 369–374. doi: 10.1109/WF-IoT.2015.7389082
  9. Levis J., Suvorov R. Automatic speech recognition. The Encyclopedia of Applied Linguistics, 2012. doi: 10.1002/9781405198431.wbeal0066
  10. Sharma A.S., Bhalley R. ASR – A real-time speech recognition on portable devices. Proc. 2nd Int. Conf. on Advances in Computing, Communication, & Automation. Bareilly, India, 2016. doi: 10.1109/ICACCAF.2016.7749004
  11. Dikii D.I., Artemeva V.D. MQTT data protocol in remote access control management model for Internet networks. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2019, vol. 19, no. 1, pp. 109–117 (in Russian). doi: 10.17586/2226-1494-2019-19-1-109-117
  12. Maarala A.I., Su X., Riekki J. Semantic reasoning for context-aware Internet of Things applications. IEEE Internet of Things Journal, 2016, vol. 4, no. 2, pp. 461–473. doi: 10.1109/jiot.2016.2587060
  13. McLaren M., Lei Y., Ferrer L. Advances in deep neural network approaches to speaker recognition. Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing. Brisbane, Australia, 2015, pp. 4814–4818. doi: 10.1109/ICASSP.2015.7178885
  14. Matejka P., Glembek O., Novotny O. et al. Analysis of DNN approaches to speaker identification. Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing. Shanghai, China,2016, pp. 5100–5104. doi: 10.1109/ICASSP.2016.7472649
  15. Buchneva T.I., Kudryashov M.Yu. Neural network in the task of speaker identification by voice. Herald of Tver State University. Series: Applied Mathematics, 2015, no. 2, pp. 119–126. (in Russian)
  16. Ge Z. et al. Neural network based speaker classification and verification systems with enhanced features. Proc. Intelligent Systems Conference. London, 2017, pp. 1089–1094. doi: 10.1109/IntelliSys.2017.8324265

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License
Copyright 2001-2019 ©
Scientific and Technical Journal
of Information Technologies, Mechanics and Optics.
All rights reserved.