TR-IT-0301 :March 1999

Eric M. Visser

Japanese Speech Recognition Lexicon Revision

Abstract:In the course of a detailed examination of the recognition results (lattices) output by ATR-SPREC, I have come across a large number of phenomena that were detrimental to recognition quality, yet seemed unnecessary. As a consequence I resolved to clean up the recognition lexicon; this Technical Report details my labours, and presents an evaluation. Cleaning up the lexicon involved removing "superfluous" entries (I will explain below which entries I considered superfluous, and why) and correcting some errors. One important reason why I did this, admittedly boring, work is that the low quality of the lexicon seriously hinders rule-based post-processing. The result of my work is a master lexicon which contains all the information in the original lexicon plus all the corrections, and a number of scripts to create training and/or recognition lexicons and abstract various kinds of data from the master lexicon. To evaluate the lexicon I modified the answer files used for training and testing so that they reflected the changes in the lexicon, re-trained the language model, and did recognition experiments.