TR-S-0005 :2000.09.08

宇佐美慶,張勁松

A design of phonetic-balanced transcript for Chinese

Abstract:This technical report presents our efforts to develop phonetic-balanced transcripts for Chinese, which are supposed necessary for building a Chinese speech data corpus. The search algorithm was based on maximum entropy rule by which an equal appearance of the phonetic context is prefered. As the preliminary experimental results, we found a 7-utterance-set in which nearly all Initials and Finals appear, and a 25-utterance-set in which 134 di-phones appear. This system is supposed to be used to generate more specifically phonetic-balanced transcripts for Chinese in the future.