TR-I-0196 :1991.2.26

Dieter Huber

A Bilingual Dialogue Database for Automatic Spoken Language Interpretation between Japanese and English

Abstract:This report presents a bilingual Japanese-English dialogue database that is presently constructed at ATR for research in spoken language interpretation via telephone. Ten subjects participated in the recording of the dialogues, five of which are native speakers of Standard Japanese (2 female, 3 male), the other five native speakers of British (1 male) and American (2 female, 2 male) English. The material consists of seven short dialogues that were chosen from the ATR Linguistic Database and represent typical conversations that may reasonably be expected to occur in conference registration by telephone. The Japanese data were spoken both in continuous and isolated phrase (Bunsetsu) modes, the English data in continuous mode only. The material was recorded in an anechoic, sound-insulated studio at the ATR Auditory and Visual Perception Research Laboratories, using high-quality digital recording equipment. The entire material comprises a total of approximately five hours of recorded dialogues. It is aimed to be used for dedicated study of prosodic transfer in spoken language interpretation, and for training continuous speech recognition systems within the field of interpreting telephony.