Nikiforov
Vladimir O.
D.Sc., Prof.
doi: 10.17586/2226-1494-2019-19-4-714-721
INTERACTION WITH INTERNET OF THINGS DEVICES BY VOICE CONTROL
Read the full article ';
For citation:
Abstract
Subject of Research. The paper considers actual popular voice assistants for voice control of Internet of Things devices, such as Google Cloud Speech-to-Text, Amazon Transcribe, IBM Speech-to-Text, Yandex SpeechKit. Their pros and cons are identified. Internet connection is necessary for operation of voice assistants aimed at data processing in the cloud and synchronization and control of user’s mobile devices. Voice assistants, which can operate without Internet connection, can have significant practical value. Method. Architectural model for on-site speech recognition (without the Internet) with the use of mobile devices is proposed. CMU Sphinx software is used as a base for spontaneous speech recognition system. The software uses both acoustic and speech models for spontaneous speech recognition and also translates voice commands into such ones that can be processed by the system based on OpenHab open platform for device control. The approaches to grammar and dictionary creation for speech recognition are proposed. Example of dictionary and grammar description for voice control of attached devices are given. In order to test the described approach, the demonstration stand was created based on single-board Raspberry Pi computer with Open Hab software installed. In addition, devices of the Internet of Things based on ESP8266 microcontroller were built. Main Results. Control of the Internet of Things devices and interaction with the server are implemented with the use of MQTT protocol. Testing of voice commands recognition is carried out. The possibility of practical application of the proposed approach to spontaneous speech recognition is shown. Practical Relevance. Proposed model describes and integrates into control system a significant part of the Internet of Things devices represented on market. By applying the model, it is possible to minimize or even remove the impact of external third-party services on working capacity of voice control system for the Internet of Things devices.
References
-
Gershenfeld N.A. When Things Start to Think. New York, Henry Holt and Company, 2000, 224 p.
-
Belenko M.V., Balakshin P.V. Comparative analysis of speech recognition systems with open code. International Research Journal, 2017, no. 4-4, pp. 13–18. (in Russian) doi: 10.23670/IRJ.2017.58.141
-
Dernoncourt F., Bui T., Chang W. A framework for speech recognition benchmarking. Proc. Interspeech. Hyderabad, India, 2018, pp. 169–170.
-
Povey D., Ghoshal A., Boulianne G. The Kaldi speech recognition toolkit. Proc. Workshop on Automatic Speech Recognition and Understanding, 2011.
-
Lamere P. et al. The CMU SPHINX-4 speech recognition system. IEEE Int. Conf. on Acoustics, Speech and Signal Processing. Hong Kong, 2003, vol. 1, pp. 2–5.
-
Balandin S., Andreev S., Koucheryavy Y. (Eds.) Internet of Things, Smart Spaces, and Next Generation Networks and Systems. Springer, 2015, 713 p. doi: 10.1007/978-3-319-10353-2
-
Pratap V., Hannun A., Xu Q. et al. Wav2Letter++: A fastest open-source speech recognition system. Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing. Brighton, UK, 2019. doi: 10.1109/icassp.2019.8683535
-
Mehrabani M., Bangalore S., Stern B. Personalized speech recognition for Internet of Things. Proc. IEEE 2nd World Forum on Internet of Things. Milan, Italy, 2015, pp. 369–374. doi: 10.1109/WF-IoT.2015.7389082
-
Levis J., Suvorov R. Automatic speech recognition. The Encyclopedia of Applied Linguistics, 2012. doi: 10.1002/9781405198431.wbeal0066
-
Sharma A.S., Bhalley R. ASR – A real-time speech recognition on portable devices. Proc. 2nd Int. Conf. on Advances in Computing, Communication, & Automation. Bareilly, India, 2016. doi: 10.1109/ICACCAF.2016.7749004
-
Dikii D.I., Artemeva V.D. MQTT data protocol in remote access control management model for Internet networks. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2019, vol. 19, no. 1, pp. 109–117 (in Russian). doi: 10.17586/2226-1494-2019-19-1-109-117
-
Maarala A.I., Su X., Riekki J. Semantic reasoning for context-aware Internet of Things applications. IEEE Internet of Things Journal, 2016, vol. 4, no. 2, pp. 461–473. doi: 10.1109/jiot.2016.2587060
-
McLaren M., Lei Y., Ferrer L. Advances in deep neural network approaches to speaker recognition. Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing. Brisbane, Australia, 2015, pp. 4814–4818. doi: 10.1109/ICASSP.2015.7178885
-
Matejka P., Glembek O., Novotny O. et al. Analysis of DNN approaches to speaker identification. Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing. Shanghai, China,2016, pp. 5100–5104. doi: 10.1109/ICASSP.2016.7472649
-
Buchneva T.I., Kudryashov M.Yu. Neural network in the task of speaker identification by voice. Herald of Tver State University. Series: Applied Mathematics, 2015, no. 2, pp. 119–126. (in Russian)
-
Ge Z. et al. Neural network based speaker classification and verification systems with enhanced features. Proc. Intelligent Systems Conference. London, 2017, pp. 1089–1094. doi: 10.1109/IntelliSys.2017.8324265