Shigeki Matsuda, Kuldip Paliwal, Satoshi Nakamura
A Study of Speech Recognition based on
Segmental Feature Model
Abstract:We introduce a segmental feature model (SFM) that represents temporal relationships between
feature vectors. A feature vector sequence can be divided into most likely periods by using the
conventional HMM. In the conventional HMM, temporal relationships between these periods
are represented, because the conventional HMM consists of plural states connected temporarily.
However, temporal relationships between feature vectors in each period is not modeled. If the
temporal relationships between the feature vectors are modeled, it is considered that feature vector
sequences can be modeled more efficiently than the conventional HMM.
The SFM calculate a probability of a fixed-dimension segmental feature vector, the segmental
feature vector is extracted from a variable-length period that is allocated to each state in the SFM.
We propose a segmental feature vector based on average values. The segmental feature vector can
calculate temporal covariances. And, we propose a new SFM that has variances in a segment
(period), to reduce missmatches between a feature vector sequence and a segmental feature vector.
For the SFM using the segmental feature vector based on average values, we performed speech
recognition experiments of a phoneme classification and a continuous phoneme recognition. The
SFMs achieved higher recognition rates than conventional HMMs in the phoneme classification
experiments. However, in the continuous phoneme classification experiments, the SFMs got lower
recognition rates than conventional HMMs. It is considered that the SFM does not estimate
phoneme boundaries rightly.