TR-A-0131

TR-A-0131 :1992.1.22

Kazuaki Obara and Tatsuya Hirahara

Auditory front-end in DTW word recognition under noisy, reverberant and multi-speaker conditions.

Abstract:In this report three front-ends, a fixed Q cochlear filter (FQF), an adaptive Q cochlear filter (AQF), and a Bark DFT (DFT), are compared for use as the front-end of a DTW system. The FQF is a conventional cascade/parallel type cochlear filter which simulates the asymmetrical filtering characteristics of a basilar membrane system. The AQF is a nonlinear cochlear filter which simulates three level-dependent characteristics of a basilar membrane system [T. Hirahara et al., Proc. ICASSP, 496-499 (1989)]. The DFT front-end generates 64-channel Bark scale coefficients based on a 512-point DFT magnitude spectrum. These three front-ends have 64 channels covering the frequency range from 1.5 to 19.5 Bark. Recognition performance for clean speech, speech degraded by adding noise and/or reverberation, and under multi speaker conditions, are compared. Four signal-to-noise ratios, S/N＝∞ (clean), 40, 20 and 10 dB, are set by adding different levels of pink noise to speech data. For reverberant speech, the impulse responses obtained in the ATR reverberation room, RT=0.2 and 1.1 seconds, were convolved with speech data. Speech data used in the experiments were 216 phoneme-balanced Japanese words uttered by 2 male and 2 female speakers. A standard dynamic time warping (DTW) system was used as a back-end. The experiments results are as follows: (1) For clean speech, AQF performance is equal to that of DFT. (2) For noisy speech, AQF performance is equal to that of FQF but more robust than that of DFT. (3) For reverberant speech, AQF is affected more than DFT but the performance is better than that of FQF. (4) For speaker variation, AQF gives better performance than do FQF or DFT． While the advantage of the AQF front-end is small with an HMM back-end [T. Hirahara et al. Proc. ICSLP, 381-384 (1990)], these results show that the AQF can be a better front-end for a DTW recognition system.