TR-SLT-0038 :2003.4.30

Stephen Nightingale

Corpus Processing for Machine Translation Experiments and Tools

Abstract:This report details some experiments in Corpus Processing for Machine Translation, involving public domain and ATR developed tools run in complex sequences. The ultimate objective of SLT Department 4 is Simultaneous Interpretation of News. This work is in support of the Machine Translation component of that goal. The first experiment covers the processes required to accomplish Statistical Machine Translation using the ATR Basic Travel Expressions Corpus. The available News Corpora are not, however, immediately suited to SMT, so subsequent experiments are performed to investigate and extract such parallel, alignable resources as are available. The results of running these experiments are analysed in published papers referenced in the text. The aim of the report is to identify useful tools and configurations for corpus processing and document them.