Kazuaki OBARA, Kiyoaki AIKAWA, Hideki KAWAHARA
Word Recognition
Using Auditory Model Front-End
Incorporating Spectro-Temporal Masking
Abstract:An auditory model front-end that reflects spectro-temporal
masking characteristics is proposed. The model gives an excellent
performance in the multi-speaker word recognition system. Recent
auditory perception research shows that the forward masking pattern
becomes more wide-spread over the frequency axis as the masker-signal
interval increases. This spectro-temporal masking characteristics is
modeled and implemented into the cochlear filter front-end for speech
recognition. The current masking level is calculated as the weighted sum
of the smoothed preceding spectra. The weight values become smaller and
the smoothing window size becomes wider on the frequency axis as the
masker-signal interval increases. The current masked spectrum is obtained
by subtracting the masking levels from the current spectrum. Word
recognition experiments demonstrated that the recognition performance
is improved by incorporating the masking effect into the cochlear filter
front-end. The performance was better than that with traditional LPC-based
word recognizers.