Speech Translation is Growing up:
From Concept Demonstration
to Lasting Impact

Professor of Computer Science
Director, International Center for Advanced
Communication Technologies（InterACT）
Carnegie Mellon University &
University of Karlsruhe
Alex Waibel

　It is indeed a great honor to write an article for ATR Up-to-Date, and it is indeed a great moment in time to do so. For it is just about 15 years, that our joint adventure to tackle a difficult problem between ATR in Japan and our laboratories in Japan, the US and Europe began. Due to ATRﾕs early（1987）leadership, its support, vision and friendship, a new research field could begin. ATR had just begun as a company and speech translation was chosen as one of four key technology areas. To address the difficulties and to consider the many aspects of many languages, the international Consortium for Speech Translation Advanced Research（C-STAR）was formed in 1991. It included just four partners（including ATR and our laboratories at CMU and UKA）at first, but it has since grown to include 20 of the leading laboratories around the world.

　The problem of speech translation is enormously difficult due to the combined difficulty of accurate speech recognition, acceptable machine translation and naturally sounding speech synthesis, of which neither can be considered a solved problem by any measure. Researchers in the component fields often derided the early efforts as unmanageable, intractable, impractical, and even not useful（!）, given the poor solutions and poor performance that existed in the component technologies at the time. Undeterred, however, those early researchers persisted, first demonstrating feasibility（1991）, then capabilities for spontaneous speech（1993-1999）and now practical fieldable solutions. In an age of globalization, where business, humanitarian, healthcare and security needs have rapidly grown beyond national boundaries, the enormous importance, yes the absolute necessity of cross-lingual technologies in every form（text, speech, image）, was to be recognized before too long.

　Now, 15 years later, virtually all governments and research sponsoring agencies in the developed world, support significant efforts in the area of speech translation. Indeed, perhaps the largest ongoing research programs that still fund speech and language research at all in Europe（TC-STAR, CHIL）and in the US（DARPA-GALE, DARPA-TRANSTAC, NSF-STR-DUST）, are now committed to crack the speech translation problem and related cross-lingual language requirements. The rapidly growing need for fast, effective response to multilingual information, the need for effective cross-lingual communication necessitate technical solutions to deliver the required greater speed, broader language coverage and lower cost than what can possibly be made available by human language services alone.

　Given 15 year history of speech translation research, and the tremendous effort and investment currently underway, one might ask, if the problem is almost solved, and if not, what challenges remain. In my view there are 4 remaining challenges:

●Robustness - Speech Translation must be reliable in all circumstances for which it is to be employed and it must deliver trusted output. How can the output of a technical device be trusted? Unlike humans, machines are woefully inadequate in judging the plausibility of their own output and articulating their own self-doubt. Robustness also remains a challenge, when we consider not only clean speech input, but highly disfluent conversational speech, noisy environments, distant microphones and stressed or emotional speech.

●Domain Unlimited Capability - While a number of practical applications can be fielded that require only translation capability in limited domains, the domain restriction of most of todayﾕs systems must be removed. This is necessary, if we wish to provide translation of open-domain spoken language tasks such as Broadcast News, Lectures/Seminars, Parliamentary Speeches, Meetings and Telephone Conversations. Domain unlimited speech translation in turn must cope with disfluent, conversational speech as well as large open domain language and vocabulary coverage.

●Language Portability - Sadly, most current efforts are concentrated around only a few languages of general interest: English, Chinese, Arabic, Spanish, Japanese, German, ... Perhaps the greatest social impact of translation technology, however, could come from capabilities in less commonly spoken languages and language pairs, where language tools and human translation services are less readily available. Short of covering 6,000 languages of the world, however, managing even 20 languages of an expanding Europe, already presents great difficulty and cost. Can more advanced machine learning techniques help to lower the cost of development and language portability?

●Human Delivery - For the language barrier to become invisible, we also have to be concerned with appropriate human interfaces that deliver language services in an unobtrusive way. Clearly, spoken input is preferable in mobile situations or meeting situations, but images may require photo or video input, or a mixture of image and voice. How should output be presented? By voice? By text? Should it be delivered via headphones, heads-up displays, speakers? Should it run on a PDA, mobile phone, laptop, or be implanted in a ubiquitous intelligent environment? Numerous intriguing possibilities exist.

　Speech translation as a research field has grown up. It has been a privilege to collaborate with ATR for 15 years on the problem of speech translation and help build foundations in a field of growing importance. Looking toward the future, many open challenges remain, but the excitement does not let up: Where else can scientists be offered the dual benefit of scientific fascination with a grand challenge problem and a guaranteed opportunity to change the world for the better? As our efforts have been so fruitful, growing international recognition of the problem also brings growing intense world-wide competition over new systems, solutions and standards. ATRﾕs pioneering benchmarking exercises as launched under IWSLT, provide a mechanism and a forum for these forces and for the best laboratories around the world to advance the state of the art rapidly and jointly. International exchange will continue to refresh and deepen our understanding of the problems and accelerate the turn-around in implementing viable solutions. Our laboratories look forward to continuing and further deepening the strong collaboration and friendship that we have begun with ATR 15 years ago.

Speech Translation is Growing up: From Concept Demonstration to Lasting Impact

Speech Translation is Growing up:
From Concept Demonstration
to Lasting Impact