INTERACTION WITH INTERNET OF THINGS DEVICES BY VOICE CONTROL

Vladislav N. Shmatkov, Patryk  Bąkowski, Dmitry S. Medvedev, Sergey V. Korzukhin, Denis V. Golendukhin, Sergey F. Spynu, Mouromtsev Dmitry I.

2019 , VOLUME 19, NUMBER 4 ( july-august )

ISSN 2226-1494 (print), ISSN 2500-0373 (online)

Publications

Editor-in-Chief

Nikiforov
Vladimir O.
D.Sc., Prof.

Partners

doi: 10.17586/2226-1494-2019-19-4-714-721

INTERACTION WITH INTERNET OF THINGS DEVICES BY VOICE CONTROL

V. N. Shmatkov, P. Bąkowski, D. S. Medvedev, S. V. Korzukhin, D. V. Golendukhin, S. F. Spynu, D. I. Mouromtsev

Read the full article

Article in Russian

For citation:

Shmatkov V.N., Bąkowski P., Medvedev D.S., Korzukhin S.V., Golendukhin D.V., Spynu S.F., Mouromtsev D.I. Interaction with Internet of Things devices by voice control. Scientiﬁc and Technical Journal of Information Technologies, Mechanics and Optics, 2019, vol. 19, no. 4, pp. 714–721 (in Russian).

doi: 10.17586/2226-1494-2019-19-4-714-721

Abstract

Subject of Research. The paper considers actual popular voice assistants for voice control of Internet of Things devices, such as Google Cloud Speech-to-Text, Amazon Transcribe, IBM Speech-to-Text, Yandex SpeechKit. Their pros and cons are identiﬁed. Internet connection is necessary for operation of voice assistants aimed at data processing in the cloud and synchronization and control of user’s mobile devices. Voice assistants, which can operate without Internet connection, can have signiﬁcant practical value. Method. Architectural model for on-site speech recognition (without the Internet) with the use of mobile devices is proposed. CMU Sphinx software is used as a base for spontaneous speech recognition system. The software uses both acoustic and speech models for spontaneous speech recognition and also translates voice commands into such ones that can be processed by the system based on OpenHab open platform for device control. The approaches to grammar and dictionary creation for speech recognition are proposed. Example of dictionary and grammar description for voice control of attached devices are given. In order to test the described approach, the demonstration stand was created based on single-board Raspberry Pi computer with Open Hab software installed. In addition, devices of the Internet of Things based on ESP8266 microcontroller were built. Main Results. Control of the Internet of Things devices and interaction with the server are implemented with the use of MQTT protocol. Testing of voice commands recognition is carried out. The possibility of practical application of the proposed approach to spontaneous speech recognition is shown. Practical Relevance. Proposed model describes and integrates into control system a signiﬁcant part of the Internet of Things devices represented on market. By applying the model, it is possible to minimize or even remove the impact of external third-party services on working capacity of voice control system for the Internet of Things devices.

Keywords: human-computer interaction, IoT, Internet of Things, voice control, smart home, device control

References

Gershenfeld N.A. When Things Start to Think. New York, Henry Holt and Company, 2000, 224 p.
Belenko M.V., Balakshin P.V. Comparative analysis of speech recognition systems with open code. International Research Journal, 2017, no. 4-4, pp. 13–18. (in Russian) doi: 10.23670/IRJ.2017.58.141
Dernoncourt F., Bui T., Chang W. A framework for speech recognition benchmarking. Proc. Interspeech. Hyderabad, India, 2018, pp. 169–170.
Povey D., Ghoshal A., Boulianne G. The Kaldi speech recognition toolkit. Proc. Workshop on Automatic Speech Recognition and Understanding, 2011.
Lamere P. et al. The CMU SPHINX-4 speech recognition system. IEEE Int. Conf. on Acoustics, Speech and Signal Processing. Hong Kong, 2003, vol. 1, pp. 2–5.
Balandin S., Andreev S., Koucheryavy Y. (Eds.) Internet of Things, Smart Spaces, and Next Generation Networks and Systems. Springer, 2015, 713 p. doi: 10.1007/978-3-319-10353-2
Pratap V., Hannun A., Xu Q. et al. Wav2Letter++: A fastest open-source speech recognition system. Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing. Brighton, UK, 2019. doi: 10.1109/icassp.2019.8683535
Mehrabani M., Bangalore S., Stern B. Personalized speech recognition for Internet of Things. Proc. IEEE 2^nd World Forum on Internet of Things. Milan, Italy, 2015, pp. 369–374. doi: 10.1109/WF-IoT.2015.7389082
Levis J., Suvorov R. Automatic speech recognition. The Encyclopedia of Applied Linguistics, 2012. doi: 10.1002/9781405198431.wbeal0066
Sharma A.S., Bhalley R. ASR – A real-time speech recognition on portable devices. Proc. 2^nd Int. Conf. on Advances in Computing, Communication, & Automation. Bareilly, India, 2016. doi: 10.1109/ICACCAF.2016.7749004
Dikii D.I., Artemeva V.D. MQTT data protocol in remote access control management model for Internet networks. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2019, vol. 19, no. 1, pp. 109–117 (in Russian). doi: 10.17586/2226-1494-2019-19-1-109-117
Maarala A.I., Su X., Riekki J. Semantic reasoning for context-aware Internet of Things applications. IEEE Internet of Things Journal, 2016, vol. 4, no. 2, pp. 461–473. doi: 10.1109/jiot.2016.2587060
McLaren M., Lei Y., Ferrer L. Advances in deep neural network approaches to speaker recognition. Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing. Brisbane, Australia, 2015, pp. 4814–4818. doi: 10.1109/ICASSP.2015.7178885
Matejka P., Glembek O., Novotny O. et al. Analysis of DNN approaches to speaker identification. Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing. Shanghai, China,2016, pp. 5100–5104. doi: 10.1109/ICASSP.2016.7472649
Buchneva T.I., Kudryashov M.Yu. Neural network in the task of speaker identification by voice. Herald of Tver State University. Series: Applied Mathematics, 2015, no. 2, pp. 119–126. (in Russian)
Ge Z. et al. Neural network based speaker classification and verification systems with enhanced features. Proc. Intelligent Systems Conference. London, 2017, pp. 1089–1094. doi: 10.1109/IntelliSys.2017.8324265

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License