TR-I-0090

TR-I-0090 :1989.7.25.

Hidefumi Sawai, Alex Waibel, Patrick Haffner, Masanori Miyatake and Kiyohiro Shikano

Parallelism, Hierarchy, Scaling in Time-Delay Neural Networks for Spotting Phonemes and CV-Syllables

Abstract:Syllable or phoneme spotting, if reliably achieved, provides a good solution to the spoken word and/or continuous speech recognition problem. We previously showed that the Time-Delay Neural Network (TDNN) provided excellent recognition performance for all phonemic subcategories (nasals, fricatives, vowels, etc.). To extend this encouraging performance of TDNNs to all phoneme recognition and word/continuous speech recognition, we show several techniques: Firstly, we show that it is indeed possible to scale up the TDNN to a large phonemic TDNN aimed at discriminating all phonemes without loss of recognition performance and without excessive training tokens. Secondly, we propose fast back-propagation learning methods which make it possible to train a large phonemic TDNN within 1.5 hours. Finally, we show several methods for spotting Japanese CV syllables/phonemes in input speech based on TDNNs: we constructed a TDNN which can discriminate a single CV syllable or phoneme. Syllable and phoneme spotting experiments show excellent results, including syllable and phoneme spotting rates of better than 96.7% and 92% correct, respectively. These spotting techniques are proved to be a good step toward continuous speech recognition.