Fadi Badra, Hirofumi Yamamoto
Comparative Study on
Multi-Class Composite N-grams
Applied to English and Japanese
Abstract:This document will present the research work I conducted in ATR for five months
from april to august 2003. My work mainly consisted in an application of Multi-Class
Composite N-gram language models, proposed recently by ATR researchers H. Yamamoto,
S. Isogai and Y. Sagisaka (Yamamoto, 2001) and only applied to Japanese language so
far, to the English language. The purpose of the experiments I conducted was to
provide experimental data on such a model for the two languages, and determine to
which extend this new technique, which showed good results for Japanese, can be applied
as well to the English model. These tests were performed for training corpora of
different sizes, extracted from the B.T.E.C., and running these experiments on the
two languages in the same conditions enabled us to make a comparison between them.
The results showed that Multi-Class language models improve conventional Class-Based
ones for English too, but with different optimal connectivity information and
only for small training corpora.