Petra Philips, Mike Schuster
Adaptation of BRNN Speech Recognition Systems
Abstract:Because speaker-independent large-vocabulary systems need huge amounts of training data the
parameters of the acoustical units have a high variance and thus give poor models for individual
utterances, being sensitive to changes of environment (speaker or channel). One attempt to solve
this problem is to transforme the feature and/or model space in order to reduce the mismatch
between the acoustical data and the acoustical models of the system.
We present some experimental results achieved with supervised and unsupervised adaptation of a
hybrid BRNN (Bidirectional Recurrent Neural Network) phoneme recognition system on TIMIT
data using
1. a Linear Input Network (LIN)
2. retraining the BRNN with weight-sharing
We show also how unsupervised adaptation can be improved using only a simple acoustical confidence measure based on the posterior probability of the recognized class for every frame.