TR-H-0007 :1993.3.31

Kazuaki OBARA, Kiyoaki AIKAWA, Hideki KAWAHARA

Speaker-Independent Speech Recognition Using an Auditory Model Front End that incorporates the Spectro-Temporal Masking Effect

Abstract:Speaker-independent speech recognition experiments using an auditory model front end with a spectro-temporal masking model demonstrated the improvement in recognition performance and outperformed both auditory front ends without the masking model and traditional LPC-based front ends. An auditory model front end composed of an adaptive Q cochlear filter-bank incorporating spectro-temporal masking has been proposed [J. Acoust. Soc. Am., Vol.92, No.4, Pt.2, pp.2476, 5pSP8, Oct. 1992]. The spectro-temporal masking model can enhance essential phonetic features by eliminating the speaker-dependent spectral tilt that reflects individual source variation. It can also enhance the spectral dynamics that convey phonological information in speech signals. These advantages result in an effective new spectral parameter to represent speech models for speaker-independent speech recognition. Speaker-independent word and phoneme recognition experiments were carried out for Japanese word and phrase databases. The masked spectrum was calculated by subtracting the masking level from logarithmic power spectra extracted using a 64-channel adaptive Q cochlear filter-bank. The masking levels were calculated as the weighted sum of the smoothed preceding spectra. To cover the variability of the time sequences of the spectrum, multi-template DTW and Hidden Markov Model were used as the back-end recognition mechanism.