TR-SLT-0054 :December 18, 2003

Vivien LE POUPON, Taro Watanabe

Implementation of EM-IS Algorithm for Machine Translation

Abstract:We want to implement the EM-IS algorithm to build a lexicon model (and then try to use it in conjunction with an IBM model). We followed a Maximum Entropy (ME) approach, but expanded it to Latent ME (because normal ME is limited by scarcity of empirical data). The principle consists in embedding the iterative scaling loop in an EM procedure in order to determine the weighting paramaters associated with the different features: we are then able to make these parameters naturally match the information contained in the corpus. We developped this method using a corpus of English-Japanese pair.