Tadashi Kumano, Hideki Tanaka, Hideki Kashioka, Takahiro Fukusima
Acquiring Bilingual Named Entity Translations
from Content-aligned Corpora
Abstract:We propose a new method for acquiring bilingual named entity (NE) translations from
non-literal, content-aligned corpora. It first recognizes NEs in each of a bilingual
document pair using the NE extraction technique, then finds NE groups whose
members share the same referent, and finally corresponds between bilingual NE groups.
The exhaustive detection of %translation fragments NEs can potentially acquire
translation pairs with broad coverage. The correspondences between bilingual NE
groups are estimated based on the similarity of the appearance order in each document,
and the corresponding performance came up to F = 71.0% by using small bilingual
dictionary together.