TR-SLT-0076 :2005.03.24 ( Internal Use )

Tadashi Kumano, Hideki Tanaka, Hideki Kashioka, Takahiro Fukusima

Acquiring Bilingual Named Entity Translations from Content-aligned Corpora

Abstract:We propose a new method for acquiring bilingual named entity (NE) translations from non-literal, content-aligned corpora. It first recognizes NEs in each of a bilingual document pair using the NE extraction technique, then finds NE groups whose members share the same referent, and finally corresponds between bilingual NE groups. The exhaustive detection of %translation fragments NEs can potentially acquire translation pairs with broad coverage. The correspondences between bilingual NE groups are estimated based on the similarity of the appearance order in each document, and the corresponding performance came up to F = 71.0% by using small bilingual dictionary together.