TR-I-0061

TR-I-0061 :December, 1988

Speech Processing Department, ATR Interpreting Telephony Research Laboratories

Speech Research at ATR Interpreting Telephony Research Laboratories

Abstract:Speech research activities at the speech processing department of the ATR Interpreting Telephony Research Laboratories are introduced. First, speech recognition research activities are summarized as follows: (1) Hidden Markov phoneme models have been improved and successfully applied to Japanese phrase utterance recognition combined with the LR predictive parser. (2) A phoneme segmentation expert based on spectrogram reading knowledge has been developed. (3) Time-Delay Neural Networks (TDNN) have been applied to phoneme recognition in word utterances. (4) Speaker adaptation algorithms have been improved using separate vector quantization and fuzzy vector quantization. Second, the research activities on speech synthesis, voice conversion and noise reduction are summarized as follows (1) The speech synthesis system proposed is a synthesis-by-rule based on an optimal selection of Non-uniform synthesis units which aims at producing natural, high quality speech sounds. (2) Voice conversion is a method to change voice individuality. The conversion method proposed here is to make use of conventional vector quantization technique. The essential part of this technique is to make mapping codebooks between two different speakers for such acoustic parameters as spectrum, pitch frequency, and power level. The conversion experiments reveal that this method is effective and promissing for the conversion of voice individuality. (3) Noise reduction is another technique which the interpreting telephony system should incorporate. This is done by a four-layered neural network with the back-propagation learning algorithm. The result reveals that the network can indeed learn to perform noise reduction even for speech and noise signals that were not part of the training data.