Advanced Communication Technology Development through "Perception"and "Production"

1. Introduction
Our development of communication technology related to the human visual system takes two approaches: visual perception and visual production. Once we understand the mechanisms of visual perception, we can feed the findings back to a visual production system designed to conform to our visual system. In turn, new technologies for visual production can be used to explore models of visual perception. In this article, two topics are described: 1) Analysis of our visual perception of moving objects and applications of the results to sports training and rehabilitation, and 2) Exploration of face-to-face communication using talking-head technology.

2. Visual prediction of moving objects
The environment surrounding humans is continually changing, a typical example of this being that of a moving object. The human visual perception of a moving object is affected by the conditions of the visual environment. It has only recently become clear that the visual prediction ability for moving objects is developed through the practice of visual perception alone. As shown in Figure 1, the subject attempts　to catch a ball traveling in a parabolic curve within a wide-field stereoscopic display system, while the level of acceleration due to gravity is varied. The effect of gravitational acceleration was examined using the subject's reaction time when catching the ball. Results show that the subject always holds out his or her hand to catch the ball at the same time, regardless of any change in gravity. This suggests that humans implicitly take into account the effect of gravity during visual prediction of a moving object [1].

Fig. 1. Experiment on visual prediction using a wide-field stereoscopic display.

　　The catching of a moving object is considered one of the important factors in sports training, which generally comprises training of both vision and motor control functions. We found that the visual prediction function utilized for a moving object is improved by practice with the human vision system alone. Furthermore, we found that prediction accuracy for a moving object not only improves, but that prediction confidence is also increased [2].

　　An effective practice method for improving visual prediction ability is investigated with respect to effective virtual image display methods, learning methods and identification of the localized part of the brain related to visual prediction. We plan to develop a new visual presentation system by continuing this research, and we will utilize the research results to develop effective sports training and rehabilitation systems, which will play an important role in the future aging society.

3. The role of facial information in personal communication
Personal characteristics such as gender and age, ever-changing facial expressions and gestures are as important as auditory language in personal communication. We propose research integrating static and dynamic facial information. Facial movement can be broken down into rigid movement of the whole head, articulated movement of the jaw, and non-rigid movement of the lips, all of which generally move in connection with speech behavior. As shown in Figure 2, head and face motion can be measured during speech behavior, and used to drive an abstracted CG face. We examined the influence on intelligibility of speech content when mapped with head, lip and jaw movement, separately or in combination. The results reveal that lip motion improved the ability to comprehend the speech content, independently from the other factors [3].

Fig. 2. Experiment on intelligibility in dialog using an abstract facial representation.

　　We also developed a talking-head animation system as a tool for synthesizing a life-like face. We began our investigation with visual perception of exaggerated emotional facial expressions, exaggerated smiles and so on using this software tool. Figure 3 shows the principle method of synthesis of the talking-head animation. First, the three-dimensional shape data for the face and the auditory speech data are acquired. Second, for the face image, mesh data is adapted and principle component analysis (PCA) is conducted. Finally, certain components of the PCA are linearly combined based on the auditory speech data. This face animation, which is similar to the subject's facial expressions and speech behavior, is synthesized using these processes [4]. Therefore, even if the face is that of a cat or comic book hero, if the three-dimensional data exist, it can be driven by auditory speech and motion-capture data. We have been developing a talking-head animation system that can be applied to any person utilizing a three-dimensional shape database of the faces of several hundred subjects. The role of facial information in personal communication is investigated using computer software that allows manipulation of facial motion with complete control. This research is connected to the development of a natural interactive interface for human communication.

　　　

Fig. 3. Synthesis method for the talking-head animation.

4. Conclusion
This article introduced the visual prediction mechanisms of moving object and the role of facial information using talking head technologies. Both themes are related to visual perception and production, with a particular focus on dynamics of visual information. We plan to continue this research both in order to gain a deeper understanding of the human vision system, and also to develop new communication tools using the research results. Moreover, we plan to further investigate the human vision system utilizing these new tools, with the objective of developing a sophisticated and advanced human communication system.

References
[1] Ando, H. (2004). Internal representation of gravity for visual prediction of an approaching 3D object. The European Conference on Visual Perception 2004, in press.
[2] Ando, H. (2001). Visual learning in the spatial prediction of an approaching 3D object. Journal of Vision, 1 (3), 313.
[3] Hill, H., Vignali, G., & Vatikiotis-Bateson, E. (2003). The interaction of static structure and time varying behavior in communicative behavior. 12th International Conference on Perception and Action.
[4] Kuratate, N., & Vatikiotis-Bateson, E. (2003). Talking face animation synthesized by facial motion mapping. The Journal of the Institute of Image Electronics Engineers of Japan, Vol.32, No.4, 355-367.

* YANO Sumio is currently at NHK Science & Technical Research Laboratories


	YANO Sumio Department of Vision Dynamics Human Information Science Laboratories *

1. Introduction Our development of communication technology related to the human visual system takes two approaches: visual perception and visual production. Once we understand the mechanisms of visual perception, we can feed the findings back to a visual production system designed to conform to our visual system. In turn, new technologies for visual production can be used to explore models of visual perception. In this article, two topics are described: 1) Analysis of our visual perception of moving objects and applications of the results to sports training and rehabilitation, and 2) Exploration of face-to-face communication using talking-head technology. 2. Visual prediction of moving objects The environment surrounding humans is continually changing, a typical example of this being that of a moving object. The human visual perception of a moving object is affected by the conditions of the visual environment. It has only recently become clear that the visual prediction ability for moving objects is developed through the practice of visual perception alone. As shown in Figure 1, the subject attempts　to catch a ball traveling in a parabolic curve within a wide-field stereoscopic display system, while the level of acceleration due to gravity is varied. The effect of gravitational acceleration was examined using the subject's reaction time when catching the ball. Results show that the subject always holds out his or her hand to catch the ball at the same time, regardless of any change in gravity. This suggests that humans implicitly take into account the effect of gravity during visual prediction of a moving object [1].