Toshiya Sakano, Tsuyoshi Morimoto
Design Principle of Language Model for Speech Recognition
Abstract:In speech recognition, the recognition rate can be improved by using statistical
information about linguistic texts. However, there is no criterion for selecting sample
linguistic texts from those available. If unbalanced linguistic text samples are selected,
the information extracted would not be suitable for a linguistic model. To solve this
problem, we need to quantitatively analyze linguistic texts.
In this report, we introduce a method to quantitatively analyze linguistic texts, and
propose a method for selecting linguistic texts. Moreover, we describe the possibility of
developing a criterion for selecting test texts needed for system evaluation, and show the
relationship between recognition system performance and the features of the sample text
group used for the linguistic model.