TR-SLT-0012 :March 29th, 2002

Norbert Binder, Rainer Gruhn

Robust Speech Recognition for Non-Native Speech Based on Phoneme Lattice Processing

Abstract:In this report, the recent research for robust recognition of non-native speech is analyzed and a new approach is introduced. Target of this method is to eliminate typical variations in non-native speech on phoneme level. During training, the phoneme substitutions and identities are extracted in a data driven way. Variations with a low occurrence frequency are rejected and all other accepted as rules. In the recognition process, a phoneme lattice is generated. By applying the previously generated rules to this lattice, new variations are added. The resulting modified lattice is then transferred to word level. Task are English conversations on hotel reservation spoken by Japanese, which were in part recorded for this thesis. An English acoustic model (AM) was trained on the Wall Street Journal speech database, and a Japanese model on the ATR TRA database. By merging, a mixed AM is generated, which allows the recognition of the phonemes of both languages for rule generation and the recognition-process itself.