Speech Processing Department,
ATR Interpreting Telephony Research Laboratories
Speech Research at ATR
Interpreting Telephony Research Laboratories
Abstract:Speech research activities at the speech processing department of the ATR Interpreting
Telephony Research Laboratories are introduced.
First, speech recognition research activities are summarized as follows:
(1) Hidden Markov phoneme models have been improved and successfully applied to Japanese
phrase utterance recognition combined with the LR predictive parser.
(2) A phoneme segmentation expert based on spectrogram reading knowledge has been
developed.
(3) Time-Delay Neural Networks (TDNN) have been applied to phoneme recognition in word
utterances.
(4) Speaker adaptation algorithms have been improved using separate vector quantization and
fuzzy vector quantization.
Second, the research activities on speech synthesis, voice conversion and noise reduction are
summarized as follows
(1) The speech synthesis system proposed is a synthesis-by-rule based on an optimal selection of
Non-uniform synthesis units which aims at producing natural, high quality speech sounds.
(2) Voice conversion is a method to change voice individuality. The conversion method proposed
here is to make use of conventional vector quantization technique. The essential part of this
technique is to make mapping codebooks between two different speakers for such acoustic
parameters as spectrum, pitch frequency, and power level. The conversion experiments reveal
that this method is effective and promissing for the conversion of voice individuality.
(3) Noise reduction is another technique which the interpreting telephony system should
incorporate. This is done by a four-layered neural network with the back-propagation learning
algorithm. The result reveals that the network can indeed learn to perform noise reduction even
for speech and noise signals that were not part of the training data.