NAKAMURA Satoshi Spoken Language Translation Research Laboratories |
||
1. Introduction 2. Heart-to-Heart Spoken Language Interface |
Figure 1. Processing of Robot's Spoken Language Interface. |
The current technology does not allow robots to reach the levels of human beings in any of these functions. When it comes to a human and a robot sharing the same knowledge and background of daily life, where the robot must understand its user's thinking patterns and social values and must respond by comprehending its user's intention through a few words, the difficulties become extreme. In fact, even human beings sometimes have difficulty interacting with each other in such complex situations. 3. Research at Spoken Language Translation Research Laboratories
Much further research remains for almost all the technology needed for interfacing robots with human speech. Besides technology for understanding the content of speech, technology is also needed in areas for generating action, for more sophisticated voice sound recognition, and for recognizing emotion. More vigorous research is required especially for realizing mutual understanding between robots and their users toward heart-to-heart communication. |
Table 1. Current State of Speech Interface Technology. |
Item |
Comments |
Sound retrieval | Technology related to the sound retrieval includes research for suppressing noise by using adaptive filters and microphone array signal processing utilizing multiple microphone elements. Multiple elements sometimes enable the sound retrieval performance to surpass that of humans. |
Speech recognition | If a user speaks clearly it is possible for robots to recognize even difficult words. In a noisy environment, however, and with no special effort made to speak clearly, including changing speaking styles, the recognition performance deteriorates considerably. |
Emotion understanding | This includes research into crude recognition resulting from rough information received on tone in speech. This technology has not yet reached the level where it can be used in conversational speech recognition. |
Speech understanding | Speech recognition is possible only in very limited areas. The understanding of speech and intent required for general robot tasks will be tackled in the future. |
Gesture generation | ・Generation of dialogue: Dialogues are possible in certain specific domains but are difficult to achieve in general domains and domains that change frequently. |
4. Conclusion |