TR-I-0209

TR-I-0209 :1991.3.29.

Hidefumi Sawai

Connectionist Large-Vocabulary Continuous Speech Recognition

Abstract:This paper describes connectionist approaches to large-vocabulary continuous speech recognition integrating speech recognition and language processing. The speech recognition part consists of the Large Phonemic Time-Delay Neural Networks (TDNNs) which can automatically spot all Japanese phonemes by simply scanning among an input speech. The language processing part is made up of a predictive LR parser which predicts subsequent phonemes based on currently processed phonemes. Recognition experiments using ATR's large-vocabulary speech database with 5,240 words and "Conference Registration" task, yielded high recognition performance. Furthermore, we discuss some extensions of the current system for robust speech recognition, speaker-adaptation and speaker- independent recognition.