TR-I-0177

TR-I-0177 :1990.9

中村雅巳,田村震一

ニューラルネットによる音素フィルタを用いた母音認識

Abstract:本稿ではニューラルネットによる音素フィルタ(PFN; Phoneme Filter Neural Network)を提案し、母音認識への適用実験により評価したので報告する。PFNは音素別に用意した、中間層を圧縮した多層二ューラルネットで構成されており、1つのPFNには1種類の音素バターンのみ恒等写像学習することにより、特定の音素のみ変形せずに通すような音素フィルタとなる。認識時は音声パターンをPFNに入力し、PFNの出カパターンと音声パターンの類似度をそれぞれのPFNについて計算し、その比較により認識を行なう。従来の分類型ニューラルネットによる音声認識では、2位以下の候補の出力値が0近くに抑えられ、認識スコアとして用いることができないという問題点があったが、本方法で日本語5母音の認識実験を行なった結呆、2位以下の累積認識率が従来の分類型ニューラルネットより良好で、認識スコアは候補カテゴリへの近さを反映していることがわかった。また、主成分分析による方法との比較実験により、PFNの非線形写像の効果が確認できた。

This paper describes a vowel filter neural network (PFN) approach to vowel recognition. Most conventional speech recognition neural networks have a serious drawback: the network output values do not correspond to candidate likelihoods. The PFN is a multi-layer neural network with fewer hidden units than input units prepared for each of the phoneme categories. Each network is trained as identity mapping by speech data belonging to one phoneme category. In the recognition process, the distance between the input data and output data is computed for each network. The results of the experiment to apply the Japanese vowel recognition task showed that the PFN recognition rates for the top 2 or more choices are higher than those of a conventional 3-layer neural network. It was also confirmed that the PFN outputs represented candidate likelihoods and that, because of it's non-linearity, the performance of the 5-layer PFN wassuperior to that of the 3-layer PFN.