TR-H-0174 :1995.11.13

Eric Vatikiotis-Bateson, Kevin G. Munhall, Makoto Hirayama, Yuenchang Lee, Demetri Terzopoulos


Abstract:While it is well-known that faces provide linguistically relevant information during communication, most efforts to identify the visual correlates of the acoustic signal have focused on the shape, position and luminance of the oral aperture. In this work, we extend the analysis to full facial motion under the assumption that the process of producing speech acoustics generates linguistically salient visual information, which is distributed over large portions of the face. Support for this is drawn from our recent studies of the eye movements of perceivers during a variety of audiovisual speech perception tasks. These studies suggest that perceivers detect visual information at low spatial frequencies and that such information may not be restricted to the region of the oral aperture. Since the biomechanical linkage between the facial and vocal tract systems is one of close proximity and shared physiology, we propose that physiological models of speech and facial motion be integrated into one audiovisual model of speech production. In addition to providing a coherent account of audiovisual motor control, the proposed model could become a useful experimental tool, providing synthetic audiovisual stimuli with realistic control parameters.