SEMI-AUTOMATIC SPEAKER VERIFICATION SYSTEM
Read the full article
For citation: Bulgakova E.V., Sholokhov A.V. Semi-automatic speaker verification system. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2016, vol. 16, no. 2, pp. 284–289. doi:10.17586/2226-1494-2016-16-2-284-289
Subject of Research. The paper presents a semi-automatic speaker verification system based on comparing of formant values, statistics of phone lengths and melodic characteristics as well. Due to the development of speech technology, there is an increased interest now in searching for expert speaker verification systems, which have high reliability and low labour intensiveness because of the automation of data processing for the expert analysis. System Description. We present a description of a novel system analyzing similarity or distinction of speaker voices based on comparing statistics of phone lengths, formant features and melodic characteristics. The characteristic feature of the proposed system based on fusion of methods is a weak correlation between the analyzed features that leads to a decrease in the error rate of speaker recognition. The system advantage is the possibility to carry out rapid analysis of recordings since the processes of data preprocessing and making decision are automated. We describe the functioning methods as well as fusion of methods to combine their decisions. Main Results. We have tested the system on the speech database of 1190 target trials and 10450 non-target trials, including the Russian speech of the male and female speakers. The recognition accuracy of the system is 98.59% on the database containing records of the male speech, and 96.17% on the database containing records of the female speech. It was also experimentally established that the formant method is the most reliable of all used methods. Practical Significance. Experimental results have shown that proposed system is applicable for the speaker recognition task in the course of phonoscopic examination.
1. Galyashina E.I. Linguistic analysis in the speaker identification systems: integrated complex examination approach based on forensic science technology. Computational Linguistics and Intellectual Technologies, 2015, vol. 1, pp.156–159.
2. Bulgakova E.V., Sholokhov A.V., Tomashenko N.A. Speakers' identification method based on comparison of phoneme lengths statistics. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2015, vol. 15, no. 1, pp. 70–77. (in Russian). doi: 10.17586/2226-1494-2015-15-1-70-77
3. Smirnova N., Starshinov A., Goloshchapova T., Oparin I. Using parameters of identical pitch contour elements for speaker discrimination. Proc. 12th Int. Conf. on Speech and Computer, SPECOM 2007. Moscow, Russia, 2007, pp. 361–366.
4. Becker T., Jessen M., Grigoras C. Forensic speaker verification using formant features and Gaussian mixture models. Proc. 9th Annual Conference of the International Speech Communication, INTERSPEECH 2008. Brisbane, Australia, 2008, pp. 1505–1508.
5. Kunzel H.J., Masthoff H.R., Koster J.P. The relation between speech tempo, loudness, and fundamental frequency: an important issue in forensic speaker recognition. Science and Justice, 1995, vol. 35, no. 4, pp. 291–295. doi: 10.1016/S1355-0306(95)72683-1
6. Nolan F. Intonation in speaker identification: an experiment on pitch alignment features. Speech, Language and the Law, 2002, vol. 9, no. 1, pp. 1–21.
7. Morrison G.S. Likelihood-ratio-based forensic speaker comparison using representations of vowel formant trajectories. Journal of the Acoustical Society of America, 2009, vol. 125, pp. 2387–2397. doi: 10.1121/1.3081384
8. Nolan F., Grigoras C. A case for formant analysis in forensic speaker identification. International Journal of Speech Language and the Law, 2005, vol. 12, no. 2, pp. 143–173. doi: 10.1558/sll.2005.12.2.143
9. Rose P., Osanai T., Kinoshita Y. Strength of forensic speaker identification evidence: multispeaker formant- and cepstrum-based segmental discrimination with a Bayesian likelihood ratio as threshold. Speech Language and the Law, 2003, vol. 10, no. 2, pp. 179–202.
10. Dellwo V., Leemann A., Kolly M.-J. Speaker idiosyncratic rhythmic features in the speech signal. Proc. 13th Annual Conference of the International Speech Communication Association, INTERSPEECH 2012. Portland, USA, 2012, pp. 1582–1585.
11. Leemann A., Kolly M.-J., Dellwo V. Speaker-individuality in suprasegmental temporal features: implications for forensic voice comparison. Forensic Science International, 2014, vol. 238, pp. 59–67. doi: 10.1016/j.forsciint.2014.02.019
12. Van Heerden C., Barnard E. Speaker-specific variability of phoneme durations. South African Computer Journal, 2008, vol. 40, pp. 44–50.
13. Matveev Y.N. Study of informative speech features for automatic speaker identification. Journal of Instrument Engineering, 2013, vol. 56, no. 2, pp. 47–51.
14. Reynolds D.A., Quatieri T.E., Dunn R.B. Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 2000, vol. 10, no. 1, pp. 19–41. doi: 10.1006/dspr.1999.0361
15. Matveev Y.N. Evaluation of the confidence interval for decision prediction of an ensemble of classifiers. Journal of Instrument Engineering, 2013, vol. 56, no. 2, pp. 74–79.
16. The NIST year 2010 Speaker Recognition Evaluation plan. Available at: http://www.itl.nist.gov/iad/mig/tests/sre/2010/NIST_SRE10_evalplan.r6.pdf (accessed 02.02.2016).
17. Bulgakova E., Sholohov A., Tomashenko N., Matveev Y. Speaker verification using spectral and durational segmental characteristics. Lecture Notes in Computer Science, 2015, vol. 9319, pp. 397–404. doi: 10.1007/978-3-319-23132-7_49
18. Kozlov A., Kudashev O., Matveev Y., Pekhovsky T., Simonchik K., Shulipa A. SVID speaker recognition system for the NIST SRE 2012. Lecture Notes in Computer Science, 2013, vol. 8113 LNAI, pp. 278–285. doi: 10.1007/978-3-319-01931-4_37