SPEAKERS' IDENTIFICATION METHOD BASED ON COMPARISON OF PHONEME LENGTHS STATISTICS

Bulgakova Elena V, Sholokhov Alexey Vladimirovich, Tomashenko Natalia A

2015 , VOLUME 15, NUMBER 1 ( JANUARY-FEBRUARY )

ISSN 2226-1494 (print), ISSN 2500-0373 (online)

Publications

Editor-in-Chief

Nikiforov
Vladimir O.
D.Sc., Prof.

Partners

doi: 10.17586/2226-1494-2015-15-1-70-77

SPEAKERS' IDENTIFICATION METHOD BASED ON COMPARISON OF PHONEME LENGTHS STATISTICS

E. V. Bulgakova, A. V. Sholokhov, N. A. Tomashenko

Read the full article

Article in русский

For citation: Bulgakova E.V., Sholokhov A.V., Tomashenko N.A. Speakers' identification method based on comparison of phoneme lengths statistics. Scientific and Technical Journal of Information Technologies, Mechanics and Optics, 2015, vol. 15, no. 1, pp. 70–77 (in Russian)

Abstract

Subject of research. The paper presents a semi-automatic method of speaker identification based on prosodic features comparison - statistics of phone lengths. Due to the development of speech technologies in recent times, there is an increased interest in searching of expert methods for speaker's voice identification, which supplement existing methods to increase identification reliability and also have low labour intensity. An efficient solution for this problem is necessary for making the reliable decision whether the voices of the speakers in the audio recordings are identical or different.

Method description. We present a novel algorithm for calculating the difference of speakers’ voices based on comparing of statistics for phone and allophone lengths. Characteristic feature of the proposed method is the possibility of its application along with the other semi-automatic methods (acoustic, auditive and linguistic) due to the lack of a strong correlation between analyzed features. The advantage of the method is the possibility to carry out rapid analysis of long-duration recordings because of preprocessing automation for data being analyzed. We describe the operation principles of an automatic speech segmentation module used for statistics calculation of sound lengths by acoustic-phonetic labeling. The software has been developed as an instrument of speech data preprocessing for expert analysis.

Method approbation. This method was approved on the speech database of 130 speech records, including the Russian speech of the male speakers and female speakers, and showed reliability equal to 71.7% on the database containing female speech records, and 78.4% on the database containing male speech records. Also it was experimentally established that the most informative of all used features are statistics of phone lengths of vowels and sonorant sounds.

Practical relevance. Experimental results have shown applicability of the proposed method for the speaker recognition task in the course of phonoscopic examination.

Keywords: phonoscopic examination, speaker recognition, semi-automatic speaker identification methods, statistics of phone lengths, phone segmentation

Acknowledgements. Работа выполнена при государственной финансовой поддержке ведущих университетов Российской Федерации (субсидия 074-U01).

References

1. Kozlov A., Kudashev O., Matveev Y., Pekhovsky T., Simonchik K., Shulipa A. SVID speaker recognition system for the NIST SRE 2012. Lecture Notes in Computer Science, 2013, vol. 8113 LNAI, pp. 278–285. doi: 10.1007/978-3-319-01931-4_37

2. Prodan A.I., Talanov A.O. Ispol'zovanie nabora slukhovykh kharakteristik rechi pri identifikatsii po golosu [Using a hearing aid in the identification of the characteristics of speech voice]. Materialy 14 Mezhdunarodnoi Konferentsii Speech and Computer, SPECOM'2011. [Proc. 14th Int. Conf. Speech and Computer, SPECOM'2011]. Kazan', Russia, 2011, pp. 338–344.

3. Koval' S.L., Khitrov M.V. Identifikatsiya diktorov pri analize raznoyazychnykh fonogramm na osnove sravneniya formantnykh spektrov [Speaker identification in the analysis of multilingual tracks basedv on formant spectra comparison]. Available at: http://zhenilo.narod.ru/new_main/ips/2003_speech.pdf, свобод- ный. Яз. рус. (accessed 7.11.2014).

4. Koval S. Formants matching as a robust method for forensic speaker identification. Proc. 11th Int. Conf. on Speech and Computer. St. Petersburg, 2006, pp. 125–128.

5. Smirnova N., Starshinov A., Oparin I., Goloshchapova T. Using parameters of identical pitch contour elements for speaker discrimination. Proc. 12th Int. Conf. on Speech and Computer, SPECOM 2007. Moscow, Russia, 2007, pp. 361–366.

6. Smirnova N.S. Speaker identification based on the comparison of utterance pitch contour parameters. Available at: http://www.dialog-21.ru/digests/dialog2007/materials/html/77.htm (accessed 7.11.2014) [In Russian].

7. Koval' S.L., Labutin P.V., Pekhovskii T.S., Proshchina E.A., Smirnova E.A., Talanov A.O. Metodika identifikatsii diktorov po golosu i rechi na osnove kompleksnogo analiza fonogramm [Technique of speaker identification by voice and speech based on a comprehensive analysis of phonograms]. Available at: http://www.dialog-21.ru/digests/dialog2007/materials/html/39.htm (accessed 7.11.2014)

8. Popov N.F., Lin'kov A.N., Kurachenkova N.B., Baicharov N.V. Identifikatsiya Lits po Fonogrammam Russkoi Rechi na Avtomatizirovannoi Sisteme "Dialekt" [Identification of Persons by Russian Speech Phonograms on the Automated System "Dialect"]. Moscow, Voiskovaya chast' 34435 Publ., 1996, 102 p.

9. Rose P. Speaker verification under realistic forensic conditions. Proc. 6th Australian Int. Conf. on Speech Science and Technology. Adelaide, South Australia, 1996, pp. 109–114.

10. Hollien H. Forensic Voice Identification. NY, Academic Press, 2001, 240 p.

11. Ladefoged P. Preliminaries to Linguistic Phonetics. Chicago, University of Chicago Press, 1971, 122 p.

12. Tomashenko N., Khokhlov Y. Fast algorithm for automatic alignment of speech and imperfect text data. Lecture Notes in Computer Science, 2013, vol. 8113 LNAI, pp. 146–153. doi: 10.1007/978-3-319-01931-4_20

13. Young S., Kershaw D., Odel J., Ollason D., Valtchev V., Woodland P. The HTK Book. Cambridge University Engineering Department, 2002, 271 p.

14. Schwarz P. Phoneme Recognition Based on Long Temporal Context. Ph.D. thesis. Brno University of Technology, 2008, 75 p.

15. Chernykh G., Korenevsky M., Levin K., Ponomareva I., Tomashenko N. State level control for acoustic model training. Lecture Notes in Computer Science, 2014, vol. 8773, pp. 435–442.

16. Chernykh G.A., Korenevsky M.L., Levin K.E., Ponomareva I.A., Tomashenko N.A. Krossvalidatsionnyi kontrol' sostoyanii pri obuchenii akusticheskikh modelei sistem avtomaticheskogo raspoznavaniya rechi [Cross-Validation State Control in Acoustic Model Training of Automatic Speech Recognition System]. Izv. vuzov. Priborostroenie, 2014, vol. 57, no. 2, pp. 23–28.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License