TR-A-0118 :1991.6.27

Tatsuya Hirahara and Hitoshi lwamida

Auditory Spectrograms in HMM Phoneme Recognition

Abstract:Several auditory spectrograms based on the adaptive Q cochlear filter and its relatives are compared in speaker dependent HMM phoneme recognition tests using clean speech, as well as speech degraded by adding pink noise. These spectrograms are created using a filter bank, an inner hair cell (IHC) model and a lateral inhibition (LINH) circuit, in different combinations. Eight different filter banks with three different types of filters are prepared: (1) a simple band pass filter with Q=4.5 and 30, (2) a conventional fixed Q cochlear filter with Q=4.5 and 30, and (3) an adaptive Q cochlear filter with feedback /feedforward control with a short/long adaptation time constant. Each filter bank is composed of 55 channel filters spaced in 1/3 Bark increments and spanning the frequency range from 1 to 18.7 Bark. The IHC model involves a saturated half wave rectifier and a short term adaptation circuit. The recognition task is to classify input tokens into 18 phoneme categories using 5,788 training tokens and 5,773 testing tokens. Results are as follows; (1) The adaptive Q cochlear filter with LINH gives better recognition performance than the other types of filter banks in all training/testing conditions. (2) The LINH effectively improves recognition performance. (3) The IHC model produces no benefit even for the noisy data set.