TR-A-0068

TR-A-0068 :1990.2.2

エレキ・ターディフ, 平原達也

内有毛細胞機能モデルの検討

Abstract:Despite a lot of effort put into building a machine which recognizes human speech as humans do, there is as yet no other system which processes speech with the precision and versatility of the human auditory system. One reason that good performance cannot be achieved might be due to the spectrum analysis method used. In many systems, the short term power spectrum based on Fourier Transformation or coefficients obtained by the LPC (Linear Predictive Coding) analysis have usually been used as input vectors.

These traditional spectral analysis methods are well defined mathematically and very easy to use. However, they give a very different speech spectrum representation from the real speech signal representation in the auditory pathway because the auditory spectral analyzer is essentially a nonlinear system while the traditional spectral analyzers are linear systems. Hence, we believe that better results could be obtained if we had a spectral analysis method which simulates the signal transformation functions in the human auditory system.

The superiority of nature's design has been an interesting subject for intensive studies. A number of physiological and psychophysical studies have been performed on the auditory system [G. von Bekesy (1960), E. C. Carterette & M. P. Friedman (1978), B. C. J. Moore (1982), J. O. Pickles (1982) R. Carlson and B. Granstorm (1982), J. V. Tobias & E. D. Schubert (1983), J. B. Allen et al. (1985), C. Berlin (1985), A. R. Moller (1983, 1986), G. M. Edelman et al. (1988)].

Those results partially clarify how the auditory functions work, but many things still remain obscure or unknown, particularly for the higher level auditory system. The auditory peripheral system gives us reliable information and data, such as the sound spectral transformation which takes place in the cochlea, where nonlinear spectral analysis is performed with a basilar membrane system and a hair cell system. Inner hair cells are the sound-to-impulse transducers, in which basilar membrane displacement is transformed into nerve impulse trains. Using this knowledge and data, many auditory models have been developped, some of which try to simulate the signal processing mechanisms or functions in the cochlear. Good surveys may be found in J. Ujihara (1976), J. B. Allen (1985), Proceedings of the Montreal Symposium on Speech Recognition (1986), and S. Greenberg (1988).

In this report, we are going to describe a functional model of the auditory peripheral system. We already have a nonlinear cochlear filter model which simulates the adaptive Q filtering characteristics of the basilar membrane system (T. Hirahara et al., 1989). The spectrograms obtained by this model are encouraging. They always give appropriate speech spectrum representation compared with the traditional linear cochlear filter-bank or DFT spectrogram. Therefore, the next step is to put the inner hair cell model after this nonlinear cochlear filter model.

We focused on a computational inner hair cell model proposed by S. Seneff, since her model simulates principal inner hair cell functions very well with simple circuits. First, we trace her model precisely so as to find any advantage and/or disadvantage of the model. Next, we describe some modification of the model so as to use it with the nonlinear cochlear filter model. Then we examine the properties of the complete cochlear model, i.e. a nonlinear cochlear filter model followed by a nonlinear inner hair cell model. Finally, the application of this model to speech with noise added is studied and an improvement proposes.