LOPEZ-GULLIVER Roberto,
SUZUKI Masami

Media Information Science Laboratories
Department of Kansei & Learning Media




1. Introduction
  When sharing our own experiences with others, we use photo albums, videotapes, music CDs, etc. to support our communication. The SenseWeb system [1] aims at supporting group sharing of experiences. It allows multiple users to simultaneously access and control multimedia information (images, sound, video, etc) by naturally using their own hands and voice [2,3].
  As an example, let's imagine you have recently been to Kyoto and visited the Kinkakuji temple, and want to tell your friends what a wonderful experience it was. Surely you have taken lots of pictures and videos. Now, how do you show them to your friends so they can share your experience? Well, you could put the pictures in photo albums and pass them one by one. You could also host a slide-show of the pictures and videos on your TV or computer monitor while your friends and family sit passively and watch.
  If you use a computer to store, organize, classify and retrieve information, you will already have some kind of graphical user interface with a mouse and keyboard. This might work well in some cases but it is not really intuitive without some training. More importantly, it is not well suited for group discussions where everybody wants to simultaneously take partial control of the interaction.

2. SenseWeb Features
  The SenseWeb system aims at overcoming the problems discussed in the previous section. That is, 1) support for group discussions and 2) simplify the computer interaction by avoiding the use of mouse and keyboard as input devices.
  Figure 1 shows a typical scene of users interacting with the SenseWeb system. Users can simultaneously touch the screen with their hands to manipulate the data presented. They use spoken language to input keywords or simple commands to trigger the retrieval and display of more information elements.

Figure 1 :
SenseWeb: Everybody at the same time. Using simple hand gestures and voice to interact with multimedia information to support group sharing of experiences.  

  Let's now see how our Kinkakuji example in the previous section would benefit users by using the SenseWeb system. Without mouse or keyboard you use your own voice and say "Kinkakuji" to have the computer search and retrieve images and videos related to it. Users can then interact with these images and videos by using simple hand movements and gestures. There is no need to learn complicated interaction techniques or new skills.
  For example, by simply touching the images, you can grab and drag them around the screen. Also, you can zoom images by touching them with two hands, and if holding them long enough, select them for bookmarking. By speaking, you can add more keywords to the discussion topic. For example, "momiji" will bring related images from your database.
  What about refining your search to "Kinkakuji AND momiji"? Simply grab both images and bring them together. The system interprets this gesture as to refine the search to both keywords. Figure 2 shows actual screenshots of these and more hand gesture interactions.
  To realize this type of interaction, the SenseWeb system detects the infrared shadows of the users hands when touching the screen. Depending on the size and shape of these shadows the system triggers actions accordingly, that control the images icons (zoom, AND, bookmark, etc)

 

 
 
 
Figure 2 : Simple hand gestures to interact with images and videos in the SenseWeb system.

  It is important to note, that all the computer interactions described above can be done by multiple users simultaneously, at any given time. Users don't need to take turns to grab any image/video to comment or discuss with others. Avoiding this turn taking is quite important in informal group discussions or brain storming sessions. Where there is not anyone particularly in charge of the presentation, but instead everybody can be in control, even at the same time.
  Now, what are the real benefits of this "natural interface" and "multiple users" computer interaction model? Briefly stated, users can concentrate on the topic of discussion instead of worrying about how to interact with the information at hand. Also, turn-taking is widely accepted as an efficient way of group communication, but allowing simultaneous interaction could be better for free and informal discussions.

3. Applications and Future Plans
  Networked communication is important for remotely located people, but a face-to-face multi-user interaction environment is probably more adequate to support group sharing of experiences. Even more, a natural interface for computer interaction, like natural hand gestures, can help to concentrate on the topic of discussion and not on the interaction itself.
  Following the above requirements the, SenseWeb system was designed as a prototype to support group sharing of experiences and ideas, in a natural way. The system can be thought as consisting of two parts, 1) a multi-user interface and 2) a multimedia retrieval application.
  We have also used the multi-user interface part for other applications including: collaborative Internet image and sound browsing, collaborative image classification, multi-user games and navigation system for exhibits. We would like to see its applications to edutainment, business brain storming and interactive image bulletin boards, among others.
  In the near future, we plan to add user identification and analysis of patterns of interaction to provide adaptive visualization on a per user basis. We also plan to integrate other input and output devices, such as cellular phones, wireless notebooks, PDAs, etc. In this way, the system could provide more detailed information and richer interaction capabilities to better support the communication process.
  For sure, sharing experiences is much more than what we have discussed here. Our laboratory's main goal is to better understand and support sharing experiences by considering context and location awareness, semantics of utterances, emotions, facial expressions, as well as other sensory clues, as smell and force-feedback.
  By integrating the basic technology of the SenseWeb and these five-sense interfaces, we expect to see applications in the near future that will lead to more comfortable engagement with information in office environments, apartment houses, commercial and shopping spaces.
  The SenseWeb is now a registered trademark.

References

[1] R. Lopez-Gulliver, N. Hagita, M. Suzuki, T. Sato, H. Tochigi. "SenseWeb: A Multi-user Environment for Browsing Images from the Internet", In Proc. of Int'l Conf. on Multimedia and Expo (ICME 2004), Taipei TAIWAN, 2004.06
[2] R. Lopez-Gulliver, H. Tochigi, T. Sato, M. Suzuki, "SenseWeb: Collaborative Image Classification in a Multi-User Interaction Environment", In Proc. of ACM Multi-media 2004, pp. 456-459, New York, USA, 2004.10
[3] 栃木博子, R. Lopez-Gulliver, 佐藤知裕, 鈴木雅実, "体感型情報共有システムSenseWebにおける協調的画像分類の評価", 情報処理学会研究報告, 2004-GN-51(20), pp.115-120, 2004.03