Takashi KOMORI, Shigeru KATAGIRI
A Minimum Error Approach to Spotting-Based Speech Recognition
Abstract:Word spotting is a fundamental approach to recognition/understanding of natural, spontaneous spoken language. An overall spotting system, comprising word models and decision thresholds, primarily needs to be optimized to minimize all spotting errors. However, in most conventional spotting systems, the word models and the thresholds are separately and heuristically designed: There has not necessarily been a theoretical basis that has allowed one to design an overall system consistently. This paper introduces a novel approach to word spotting, by proposing a new design method called Minimum Spotting Error learning (MSPE). MSPE is conceptually based on a recent discriminative learning theory, i.e., the Minimum Classification Error learning (MCE)/Generalized Probabilistic Descent method (GPD); it features a rigorous framework for minimizing spotting error objectives.
MSPE can be used in a wide range of spotting pattern applications, such as spoken phonemes, written characters as well as spoken words.
Experimental results for a Japanese phoneme spotting task clearly demonstrate the promising future of the proposed approach.