Stephen Nightingale
Corpus Processing for Machine Translation
Experiments and Tools
Abstract:This report details some experiments in Corpus Processing for Machine Translation,
involving public domain and ATR developed tools run in complex sequences. The
ultimate objective of SLT Department 4 is Simultaneous Interpretation of News. This
work is in support of the Machine Translation component of that goal. The first
experiment covers the processes required to accomplish Statistical Machine
Translation using the ATR Basic Travel Expressions Corpus. The available News
Corpora are not, however, immediately suited to SMT, so subsequent experiments are
performed to investigate and extract such parallel, alignable resources as are available.
The results of running these experiments are analysed in published papers referenced
in the text. The aim of the report is to identify useful tools and configurations for corpus
processing and document them.