DOI: 10.17586/2226-1494-2015-15-5-886-892


A. L. Oleinik

Read the full article 
Article in Russian

For citation: Oleinik A.L. Application of Partial Least Squares regression for audio-visual speech processing and modeling. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2015, vol. 15, no. 5, pp. 886–892


Subject of Research. The paper deals with the problem of lip region image reconstruction from speech signal by means of Partial Least Squares regression. Such problems arise in connection with development of audio-visual speech processing methods. Audio-visual speech consists of acoustic and visual components (called modalities). Applications of audio-visual speech processing methods include joint modeling of voice and lips’ movement dynamics, synchronization of audio and video streams, emotion recognition, liveness detection. Method. Partial Least Squares regression was applied to solve the posed problem. This method extracts components of initial data with high covariance. These components are used to build  regression model. Advantage of this approach lies in the possibility of achieving two goals: identification of latent interrelations between initial data components (e.g. speech signal and lip region image) and approximation of initial data component as a function of another one. Main Results. Experimental research on reconstruction of lip region images from speech signal was carried out on VidTIMIT audio-visual speech database. Results of the experiment showed that Partial Least Squares regression is capable of solving reconstruction problem. Practical Significance. Obtained findings give the possibility to assert that Partial Least Squares regression is successfully applicable for solution of vast variety of audio-visual speech processing problems: from synchronization of audio and video streams to liveness detection.

Keywords: audio-visual speech processing, bimodal speech systems, Partial Least Squares, PLS, subspace methods, regression.

Acknowledgements. The work was done under government financial support for the leading universities of the Russian Federation (grant 074- U01). The author expresses his sincere appreciation to Professor Georgy Kukharev, his scientific adviser, and Yuri Matveev, Head of SIS Department, for their critical remarks and advice that significantly improved the paper.

1. Chetty G., Wagner M. Liveness detection using cross-modal correlations in face-voice person authentication. Proc. 9th European Conference on Speech Communication and Technology. Lisbon, Portugal, 2005, pp. 2181–2184.
2. McGurk H., MacDonald J. Hearing lips and seeing voices. Nature, 1976, vol. 264, no. 5588, pp. 746–748.
3. Aghaahmadi M., Dehshibi M.M., Bastanfard A., Fazlali M. Clustering Persian viseme using phoneme subspace for developing visual speech application. Multimedia Tools and Applications, 2013, vol. 65, no. 3,  pp. 521–541. doi: 10.1007/s11042-012-1128-7
4. Pearson K. On lines and planes of closest fit to system of points in space. Philosophical Magazine, 1901, vol. 2, no. 6, pp. 559–572.
5. Fisher R.A. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 1936, vol. 7, no. 2, pp. 179–188.
6. Hotelling H. Relations between two sets of variates. Biometrika, 1936, vol. 28, no. ¾, pp. 321–377.
7. Kukharev G., Kamenskaya E. Application of two-dimensional canonical correlation analysis for face image processing and recognition. Pattern Recognition and Image Analysis, 2010, vol. 20, no. 2, pp. 210–219. doi: 10.1134/S1054661810020136
8. Kukharev G.A., Kamenskaya E.I., Matveev Y.N., Shchegoleva N.L. Metody Obrabotki i Raspoznavaniya Izobrazhenii Lits v Zadachakh Biometrii [Methods for Face Image Processing and Recognition in Biometric
Applications] Ed. M.V. Khitrov. St. Petersburg, Politekhnika Publ., 2013, 388 p.
9. De Bie T., Cristianini N., Rosipal R. Eigenproblems in pattern recognition. In: Handbook of Geometric Computing. Ed. E.B. Corrochano. Berlin, Springer, 2005, pp. 129–167. doi: 10.1007/3-540-28247-5_5
10. Meng H., Huang D., Wang H., Yang H., Al-Shuraifi M., Wang Y. Depression recognition based on dynamic facial and vocal expression features using partial least square regression. Proc. 3rd ACM International Workshop on Audio/Visual Emotion Challenge, AVEC 2013. Barselona, Spain, 2013, pp. 21–29. doi: 10.1145/2512530.2512532
11. Liu M., Wang R., Huang Z., Shan S., Chen X. Partial least squares regression on grassmannian manifold for emotion recognition. Proc. 15th ACM on International Conference on Multimodal Interaction, ICMI'13. Sydney, Australia, 2013, pp. 525–530. doi: 10.1145/2522848.2531738
12. Bakry A., Elgammal A. MKPLS: Manifold kernel partial least squares for lipreading and speaker identification. Proc. 26th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2013. Portland, USA, 2013, pp. 684–691. doi: 10.1109/CVPR.2013.94
13. Xie Z. Partial least squares regression on DCT domain for infrared face recognition. Proceedings of SPIE – Progress in Biomedical Optics and Imaging, 2014, vol. 9230, art. 92301I. doi: 10.1117/12.2068214
14. Abdi H. Partial least squares regression and projection on latent structure regression (PLS Regression). Wiley Interdisciplinary Reviews: Computational Statistics, 2010, vol. 2, no. 1, pp. 97–106. doi:
15. Esbensen K.H. Multivariate Date Analysis – In Practice. 5th ed. Oslo, Norway, CAMO Process AS, 2002, 598 p.
16. Sanderson C., Lovell B.C. Multi-region probabilistic histograms for robust and scalable identity inference. Lecture Notes in Computer Science, 2009, vol. 5558 LNCS, pp. 199–208. doi: 10.1007/978-3-642-01793-3_21
17. Wojcicki K. Mel Frequency Cepstral Coefficient Feature Extraction. Available at: (accessed: 2015.06.12).
18. Huang X., Acero A., Hon H.W. Spoken Language Processing: A Guide to Theory, Algorithm, and System Development. Prentice Hall, 2001, 1008 p.
19. Kukharev G., Tujaka A., Forczmanski P. Face recognition using two-dimensional CCA and PLS. International Journal of Biometrics, 2011, vol. 3, no. 4, pp. 300–321. doi: 10.1504/IJBM.2011.042814
Copyright 2001-2017 ©
Scientific and Technical Journal
of Information Technologies, Mechanics and Optics.
All rights reserved.