TR-SLT-0049 :2003.09.18

Rui Yang, Hirofumi Yamamoto, Yoshinori Sagisaka

Comparison of Chinese, Japanese and English: Applying Multi-class N-gram Language Model

Abstract:This document presents my research work in ATR for one and a half months. In Multi- class Composite N-gram language model (Yamamoto, 2001), we use different weight to merge backward information and forward information. It shows the improvement of perplexity, both on Japanese and English for small corpus size (Fadi Badra, 2003, TR- SLT-0046). This experiment applied the Multi-class bigram language model to Chinese, to determine which weight of the backward and forward information is better for Chinese. Based on corpus of BTEC (Chinese version, rev.1), the result of the experiment shows that Multi-class Ngram model is also effective for Chinese language. The best weight of backward information is 1.0, while that of forward information is 0.0.