TR-I-0053

TR-I-0053 :1988.11

Katsuo Abe and Yoshinori Sagisaka

On the unit selection measure for speech synthesis by rule using multiple synthesis units

Abstract:In speech synthesis by rule, we have already proposed a synthesis scheme using non-uniform speech synthesis units to obtain the optimum synthesis unit sequence for a desired output speech. In this paper, vowel spectrum variations are analyzed using the LPC-cepstrum distance to introduce a quantitative measure for unit selection. Using 5,240 words, vowel spectral distortion resulting from the contextual differences was compared, and the following tendencies were found:(1)The following consonant, the position in the utterance, and the accentuation affect vowel spectral envelopes in above order. (2)For CVs whose following consonants have the same point or manner of articulation, the spectral distance among the vowels of the CVs is 12% smaller than the average. (3)The vowel spectrum of the word peripheral CV differs from that of the word medial CV. Based on these results, a quantitative measure is introduced to represent spectral similarities of each vowel. With this measure, the unit selection scheme was tested using open data. Through these experiments, it was not only confirmed that the previously proposed categorical measures are adequate for general unit selection, but also shown that some phoneme combinations should be specially scored for unit selection.