Tatsuya Hirahara and Hitoshi lwamida
Auditory Spectrograms in HMM
Phoneme Recognition
Abstract:Several auditory spectrograms based on the adaptive Q
cochlear filter and its relatives are compared in speaker
dependent HMM phoneme recognition tests using clean speech, as
well as speech degraded by adding pink noise. These
spectrograms are created using a filter bank, an inner hair cell
(IHC) model and a lateral inhibition (LINH) circuit, in different
combinations. Eight different filter banks with three different
types of filters are prepared: (1) a simple band pass filter with
Q=4.5 and 30, (2) a conventional fixed Q cochlear filter with Q=4.5
and 30, and (3) an adaptive Q cochlear filter with feedback
/feedforward control with a short/long adaptation time constant.
Each filter bank is composed of 55 channel filters spaced in 1/3
Bark increments and spanning the frequency range from 1 to 18.7
Bark. The IHC model involves a saturated half wave rectifier and
a short term adaptation circuit. The recognition task is to classify
input tokens into 18 phoneme categories using 5,788 training
tokens and 5,773 testing tokens. Results are as follows; (1) The
adaptive Q cochlear filter with LINH gives better recognition
performance than the other types of filter banks in all
training/testing conditions. (2) The LINH effectively improves
recognition performance. (3) The IHC model produces no benefit
even for the noisy data set.