TWO-STEP ALGORITHM OF TRAINING INITIALIZATION FOR ACOUSTIC MODELS BASED ON DEEP NEURAL NETWORKS
Read the full article
For citation: Medennikov I.P. Two-step algorithm of training initialization for acoustic models based on deep neural networks. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2016, vol. 16, no. 2, pp. 379–381. doi:10.17586/2226-1494-2016-16-2-379-381
This paper presents a two-step initialization algorithm for training of acoustic models based on deep neural networks. The algorithm is focused on reducing the impact of the non-speech segments on the acoustic model training. The idea of the proposed algorithm is to reduce the percentage of non-speech examples in the training set. Effectiveness evaluation of the algorithm has been carried out on the example of English spontaneous telephone speech recognition (Switchboard). The application of the proposed algorithm has led to 3% relative word error rate reduction, compared with the training initialization by restricted Boltzmann machines. The results presented in the paper can be applied in the development of automatic speech recognition systems.
1. Hinton G., Deng L., Yu D., Dahl G., Mohamed A.-R., Jaitly N., Senior A., Vanhoucke V., Nguyen P., Sainath T., Kingsbury B. Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Processing Magazine, 2012, vol. 29, no. 6, pp. 82–97. doi: 10.1109/MSP.2012.2205597
2. Dahl G.E., Yu D., Deng L., Acero A. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Transactions on Audio, Speech and Language Processing, 2012, vol. 20, no. 1, pp. 30‒42. doi: 10.1109/TASL.2011.2134090
3. Potapov A.S., Batishcheva V.V., Pang Shu-Chao. Improvement of recognition quality in deep learning networks by simulated annealing method. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2014, no. 5 (93), pp. 71–76. (In Russian)
4. Godfrey J., Holliman E., McDaniel J. Switchboard: telephone speech corpus for research and development. Proc. Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP. San Francisco, USA, 1992, vol. 1, pp. 517–520. doi: 10.1109/ICASSP.1992.225858
5. Hinton G.E., Osindero S., Teh Y.-W. A fast learning algorithm for deep belief nets. Neural Computation, 2006, vol. 18, no. 7, pp. 1527–1554. doi: 10.1162/neco.2006.18.7.1527
6. Vincent P., Larochelle H., Bengio Y., Manzagol P.-A. Extracting and composing robust features with denoising autoencoders. Proc. 25th International Conference on Machine Learning. Helsinki, Finland, 2008, pp. 1096‒1103.
7. Bengio Y., Lamblin P., Popovici D., Larochelle H. Greedy layer-wise training of deep networks. Proc. 20th Annual Conf. on Neural Information Processing Systems, NIPS 2006. Vancouver, Canada, 2006,
8. Povey D., Ghoshal A., Boulianne G., Burget L., Glembek O., Goel N., Hannemann M., Motlicek P., Qian Y., Schwarz P., Silovsky J., Stemmer G., Vesely K. The Kaldi speech recognition toolkit. Proc. IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU. Waikoloa, USA, 2011, pp. 1‒4.
9. Vesely K., Ghoshal A., Burget L., Povey D. Sequence-discriminative training of deep neural networks. Proc. of the Annual Conference of International Speech Communication Association, INTERSPEECH. Lyon, France, 2013, pp. 2345‒2349.
10. Seide F., Li G., Yu D. Conversational speech transcription using context-dependent deep neural networks. Proc. of the Annual Conference of International Speech Communication Association, INTERSPEECH. Florence, Italy, 2011, pp. 437‒440.