川端豪,花沢利行,鹿野清宏
HMM音韻認識に基づくワードスポッティング
Abstract:A new technique for detecting and locating keywords in continuous speech using
HMM(Hidden Markov Model) phoneme recognition is proposed. HMM word
models are composed of HMM phone models trained on an isolated word database.
Because the speaking rate between isolated words and continuous speech is
different, phoneme spectra and durations change considerably. An HMM consists of
several states and arcs. Each arc has output probabilities for each VQ code. In order
to cope with the spectral changes, the output probabilities are smoothed with the
probabilities of their spectral neighbor codes. In order to cope with the duration
changes, HMM state duration parameters are shifted according to a 2nd order
duration calibration curve. The calibration curve is obtained from a speaking rate
ratio of continuous speech to isolated words. The word detection rate for 8
keywords in 25 sentences uttered by one speaker was 98.4%. Accurate word
spotting is accomplished using the HMM output probability smoothing technique
and the state duration control mechanism taking the speaking rate into account.