Menu
Publications
2024
2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
Editor-in-Chief
Nikiforov
Vladimir O.
D.Sc., Prof.
Partners
doi: 10.17586/2226-1494-2019-19-3-557-559
AUTOMATIC SPEECH RECOGNITION IN PRESENCE OF MUSIC NOISE ON MULTICHANNEL FAR-FIELD RECORDINGS
Read the full article ';
Article in русский
For citation:
Abstract
For citation:
Astapov S.S., Shuranov E.V., Lavrentyev A.V., Kabarov V.I. Automatic speech recognition in presence of music noise on multichannel far-field recordings. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2019, vol. 19, no. 3, pp. 557–559 (in Russian). doi: 10.17586/2226-1494-2019-19-3-557–559
Abstract
Subject of Research. The paper considers a method of music noise reduction in a multichannel speech signal based on noise mask estimation. The method is applied for automatic speech recognition in presence of music noise. Method. The study is performed using an acoustic model implemented in artificial neural networks and real life recordings performed in reverberant conditions. Main Results. It is shown that the acoustic model is capable of estimating the noise mask on a multichannel mixture for different music genres. The application of such mask to covariance matrix estimation for MVDR (Minimum Variance Distortionless Response) beamforming algorithm results in increasing the recognition accuracy by at least 4.9 % at signal-noise ratio levels of 10–30 dB. Practical Relevance. The method of MVDR coefficient estimation based on noise mask estimation by an acoustic model serves to suppress non-stationary noise, such as music, thus increasing the robustness of automatic speech recognition systems.
Keywords: microphone array, MVDR, acoustic model, noise mask estimation, music noise reduction, automatic speech recognition
Acknowledgements. This work was financially supported by the Ministry of Education and Science of the Russian Federation, Contract 14.575.21.0132 (IDRFMEFI57517X0132).
References
Acknowledgements. This work was financially supported by the Ministry of Education and Science of the Russian Federation, Contract 14.575.21.0132 (IDRFMEFI57517X0132).
References
1. Heymann J., Drude L., Haeb-Umbach R. Neural network based spectral mask estimation for acoustic beamforming. Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing. Shanghai, China, 2016, pp. 196–200. doi: 10.1109/icassp.2016.7471664
2. Higuchi T., Ito N., Yoshioka T., Nakatani T. Robust MVDR beamforming using time-frequency masks for online/ offline ASR in noise. Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing. Shanghai, China, 2016, pp. 5210–5214. doi: 10.1109/icassp.2016.7472671
3. Li B., Sainath T.N., Weiss R.J., Wilson K.W., Bacchiani M. Neural network adaptive beamforming for robust multichannel speech recognition. Proc. INTERSPEECH, 2016, pp. 1976–1980. doi: 10.21437/interspeech.2016-173
4. Yoshioka T. et al. The NTT CHiME-3 system: advances in speech enhancement and recognition for mobile multi-microphone devices. Proc. IEEE Workshop on Automatic Speech Recognition and Understanding. Scottsdale, USA, 2015, pp. 436–443. doi: 10.1109/asru.2015.7404828
5. Du J. et al. The USTC-iFlyteck system for the CHiME4 challenge. Proc. 4th Int. Workshop on Processing in Everyday Environments, 2016.
6. Brandstein M., Ward D. Microphone Arrays: Signal Processing Techniques and Applications. Springer, 2001, 398 p.
2. Higuchi T., Ito N., Yoshioka T., Nakatani T. Robust MVDR beamforming using time-frequency masks for online/ offline ASR in noise. Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing. Shanghai, China, 2016, pp. 5210–5214. doi: 10.1109/icassp.2016.7472671
3. Li B., Sainath T.N., Weiss R.J., Wilson K.W., Bacchiani M. Neural network adaptive beamforming for robust multichannel speech recognition. Proc. INTERSPEECH, 2016, pp. 1976–1980. doi: 10.21437/interspeech.2016-173
4. Yoshioka T. et al. The NTT CHiME-3 system: advances in speech enhancement and recognition for mobile multi-microphone devices. Proc. IEEE Workshop on Automatic Speech Recognition and Understanding. Scottsdale, USA, 2015, pp. 436–443. doi: 10.1109/asru.2015.7404828
5. Du J. et al. The USTC-iFlyteck system for the CHiME4 challenge. Proc. 4th Int. Workshop on Processing in Everyday Environments, 2016.
6. Brandstein M., Ward D. Microphone Arrays: Signal Processing Techniques and Applications. Springer, 2001, 398 p.