Norbert Binder, Rainer Gruhn
Robust Speech Recognition for Non-Native
Speech Based on Phoneme Lattice Processing
Abstract:In this report, the recent research for robust recognition of non-native speech is analyzed and a new
approach is introduced. Target of this method is to eliminate typical variations in non-native speech on
phoneme level.
During training, the phoneme substitutions and identities are extracted in a data driven way. Variations
with a low occurrence frequency are rejected and all other accepted as rules. In the recognition process, a
phoneme lattice is generated. By applying the previously generated rules to this lattice, new variations are
added. The resulting modified lattice is then transferred to word level. Task are English conversations on
hotel reservation spoken by Japanese, which were in part recorded for this thesis. An English acoustic
model (AM) was trained on the Wall Street Journal speech database, and a Japanese model on the ATR
TRA database. By merging, a mixed AM is generated, which allows the recognition of the phonemes of
both languages for rule generation and the recognition-process itself.