METHOD OF HIGH-QUALITY SPEECH SYNTHESIS WITH A SMALL DATABASE USAGE
Read the full article
We propose an approach to synthesizing high-quality speech in view of a small initial speech database. A robust method for solving this problem is vital for voice restoration (recovery of the lost fragments of recordings based on available speech material of a well-known person, e.g. an actor). The proposed TTS (text-to-speech) system is a hybrid one that combines the advantages of both HMM- and Unit Selection-based TTS systems. The paper deals with the approach based on statistical models of intonation parameters, which makes it possible to preserve the speaker's pronunciation in synthesized speech. We describe the preparation of the database and the solution to the problem of shortage of original speech material for model training. Special algorithms of speech element concatenation and modification are effective to correct parameters according to the requirements, provide overall tonal smoothness and reduce spectral distortion at the boundaries of concatenated elements. Listening tests showed the efficiency of the proposed methods and proved the possibility of highquality speech synthesis even with a small speech database (right up to one hour of speech).