Rui Yang, Hirofumi Yamamoto, Yoshinori Sagisaka
Comparison of Chinese, Japanese and English:
Applying Multi-class N-gram Language Model
Abstract:This document presents my research work in ATR for one and a half months. In Multi-
class Composite N-gram language model (Yamamoto, 2001), we use different weight
to merge backward information and forward information. It shows the improvement of
perplexity, both on Japanese and English for small corpus size (Fadi Badra, 2003, TR-
SLT-0046). This experiment applied the Multi-class bigram language model to
Chinese, to determine which weight of the backward and forward information is better
for Chinese. Based on corpus of BTEC (Chinese version, rev.1), the result of the
experiment shows that Multi-class Ngram model is also effective for Chinese language.
The best weight of backward information is 1.0, while that of forward information is 0.0.