Katsuo Abe and Yoshinori Sagisaka
On the unit selection measure for speech synthesis by
rule using multiple synthesis units
Abstract:In speech synthesis by rule, we have already proposed a synthesis
scheme using non-uniform speech synthesis units to obtain the optimum synthesis
unit sequence for a desired output speech. In this paper, vowel spectrum variations
are analyzed using the LPC-cepstrum distance to introduce a quantitative measure
for unit selection. Using 5,240 words, vowel spectral distortion resulting from the
contextual differences was compared, and the following tendencies were
found:(1)The following consonant, the position in the utterance, and the
accentuation affect vowel spectral envelopes in above order. (2)For CVs whose
following consonants have the same point or manner of articulation, the spectral
distance among the vowels of the CVs is 12% smaller than the average. (3)The
vowel spectrum of the word peripheral CV differs from that of the word medial CV.
Based on these results, a quantitative measure is introduced to represent spectral
similarities of each vowel. With this measure, the unit selection scheme was tested
using open data. Through these experiments, it was not only confirmed that the
previously proposed categorical measures are adequate for general unit selection,
but also shown that some phoneme combinations should be specially scored for unit
selection.