TR-I-0174 :1990

Alain BIEM & Masahide SUGIYAMA

Study on Combining HMMs and Neural Network Models

TDNN-HMM for Phoneme recognition

Abstract:This report focuses on the combination of Hidden Markov Models (HMMs) and Neural networks. Hidden Markov Models are a stochastic approach for continuous speech well suited to cope with the variability of speech. On the other hand, neural networks have shown high classification power for short speech utterances. Therefore, we can try to build a system with the advantage of Hidden Markov models and neural networks. We first review some of the most interesting works on this field recently and secondly we discuss a new idea: To build a codebook from the TDNN output units and train HMMs using the Fuzzy-VQ algorithm. We trained several discrete HMMs for the recognition task of /b/, /d/, /g/ using just one TDNN-generated codebook and we achieved a recognition rate of 97.2%. Close Interpretation reveals that the quality of extracted feature representations could be more important than the amount of data used for training as well as the discrimination power of the Neural Networks used here as a preprocessor.