TR-H-0252 :1998.8.3

Alain DE CHEVEIGNÉ and Hideki KAWAHARA

A Model of Vowel Perception based on Missing Feature Theory

Abstract:Vowel identity correlates well with the shape of the transfer function of the vocal tract, or spectral envelope, rather than with the short-term spectrum, which contains peaks at multiples of the fundamental frequency (F0) that sample the spectral envelope. It is not clear how the auditory system estimates the original spectral envelope from representation that it derives from the vowel waveform. Cochlear excitation patterns, for example, have high resolution in the low frequency region, and their shape varies strongly with F0. The problem is acute at high F0s where the spectral envelope is highly undersampled. This paper treats vowel indentification as a form of pattern recognition with missing data. Rather than trying to interpolate the spectral envelope from an incomplete set of samples, we perform pattern matching restricted to available samples. Missing data points are ignored. In other words, a non-uniform weighting function, dependent on F0, is used to emphasized spectral regions near harmonics of the fundamental, at the expense of other regions. The model is presented in two versions: a frequency-domain version based on short-term spectra or tonotopic excitation patterns, and a time-domain version based on autocorrelation functions. It accounts well for the fact that vowel identification is relatively insensitive to F0-related features of the short-term spectrum.