TR-H-0240 :1998.3.16

Alain BIEM

Discriminative Feature Extraction Applied to Speech Recognition

Abstract:Will it be possible to talk to your computer? This question summarizes the paradigm of automatic speech recognition. The building of a machine which simulates the human perception process has been, and still is, the dream of many researchers worldwide. In addition to the straightforward economic opportunities that would result from speech recognition applications, such as voice dialing, automatic dictation, and more sophisticated human-machine interactions, a computer that can perceive human speech would also be useful for the handicapped. Automatic transcription of TV program will be available for the deaf and people without arms will find themselves suddenly capable of performing any basic task that usually requires the use of hands. Obviously, using our voice in place of our hands will have revolutionary effects on our way of living. However, from the design of a simple "phonetic typewriter" to the realization of a "perceptual" machine, various difficulties are in the way to realizing these goals. The most fundamental and certainly the most difficult task is to find out what "perception" is. The mystery is that, we use our perceptual abilities on a daily basis without being able to understand this fundamental concept. The debate over whether this "self-perception" is possible is beyond the scope of this report. However, since it is in the human nature to try to simulate nature, pattern recognition, which means for instance, recognizing a shape, or being able to distinguish a dog from a cat, or one person voice from another one, is one aspect of these human capabilities that has been to target of many simulation attempts by computers. The human perception process is an astounding pattern recognizer. It seems therefore natural to study the baseline process which permits a given person to recognize thousands of words, even when uttered in a very noisy environment. However, pattern recognition research, based on perceptual simulation, is limited due to the fact that the physiological and the biological aspects of the human perception process can not be investigated in vivo. The field of psychoacoustics, which considers the human being as a black box and tries to comprehend the inherent perceptual processes by analyzing responses to selected stimuli, sometimes produces conflicting perceptual models. So far, a good model of human perception is thus still awaited. Instead of waiting for the final theory of "what cognition is", researchers have preferred the use of mathematical tools in the pattern recognition field, assuming that despite the lack of a clear perceptual theory, a few problems may be solved by classical mathematical models. In particular, pattern recognition through classification is one of the problems that a computer should be able to solve. Classification, that is, the assignment of events or objects to prescribed categories, is the first step toward the building of a perceptual machine. There is a wide area of literature covering the subject that has led to a theoretical formulation of pattern recognition and proposed solutions to particular problems. Speech recognition by machine can be viewed as one particular aspect of a more general machine-based theory of pattern recognition.