呉義堅, 河井恒, 倪晋富, 王仁華
音声合成へのHMM 技術の応用
Abstract:This report includes three major parts. The first part is HMM-based segmentation by combining the minimum-segmentation-error based discriminative training and explicit duration modeling techniques. The second part is HMM-based prosody modeling for Chinese speech synthesis application. The contextual feature and the question set are designed according to the Chinese characteristics. Also, we improve the tree·based clustering by considering the space weight and the meaning of questions. The last part is automatic detection of Japanese vowel devoicing for corpus construction. The implied likelihood differences are extracted from the recognition process as the voicing measure. Also, we apply the discriminative training for voiced/devoiced HMM training, and incorporate the voicing features, including autocorrelation, energy and duration, to improve the detection performance.