Alex Park
Relating Phonetic Feature Stream Reliability to
Noise Robust Speech Recognition
Abstract:This report describes an attempt to improve robustness in automatic speech
recognition by optimizing the extraction of phonetic feature streams
independently of the recognition process. The eventual goal is to use a bank of
feature extraction modules which use specialized signal processing and
statistical techniques to reliably extract speech relevant features from the
acoustic signal. In this work, we demonstrate the viability of using a sparse
set of feature streams for a simple connected digit recognition task. We then
illustrate some techniques for improving the reliability of the voicing feature
module and evaluate several alternatives modules using both clean and noisy data.
Finally, we relate the reliability of extraction for the individual voicing module
to the overall performance of the recognizer by performing recognition experiments
on the Aurora 2 database.