TR-I-0057

TR-I-0057 :1988.11.21

川端豪,花沢利行,鹿野清宏

HMM音韻認識に基づくワードスポッティング

Abstract:A new technique for detecting and locating keywords in continuous speech using HMM(Hidden Markov Model) phoneme recognition is proposed. HMM word models are composed of HMM phone models trained on an isolated word database. Because the speaking rate between isolated words and continuous speech is different, phoneme spectra and durations change considerably. An HMM consists of several states and arcs. Each arc has output probabilities for each VQ code. In order to cope with the spectral changes, the output probabilities are smoothed with the probabilities of their spectral neighbor codes. In order to cope with the duration changes, HMM state duration parameters are shifted according to a 2nd order duration calibration curve. The calibration curve is obtained from a speaking rate ratio of continuous speech to isolated words. The word detection rate for 8 keywords in 25 sentences uttered by one speaker was 98.4%. Accurate word spotting is accomplished using the HMM output probability smoothing technique and the state duration control mechanism taking the speaking rate into account.