TR-SLT-0057 :January 27,2004

Tam Wai Lok, Kyonghee Paik

Comparative Analysis of Chinese, Japanese and Korean Numeral Classifier

Abstract:We will present our analysis of numeral classifiers extracted from Japanese, Korean, and Chinese corpora. We compare how numeral classifiers are matched with their referents in our corpora with the results produced by the algorithm given in Bond and Paik (2000) for generating classifiers using semantic classes from an ontology provided by Goi-Taikei. We also attempt at automatically analyzing the Japanese sentences containing classifiers by typing the classifiers contained following Bond (2001) and the syntactic construction following Asahioka et al (1990). We have identified some problematic constructions in Chinese and Japanese and point out the phenomenon that classifier types change in the course of translation. We have also shown that the anaphoric usage of numeral classifiers is problematic to machine translation. In conclusion, we point out the difficulty to predict the correct numeral classifiers to be used when translating between Chinese, Japanese and Korean as the domain covered by the same type of classifiers and the constructions containing numeral classifiers vary. For further work, we suggest analyzing classifier constructions using statistical model based on the data produced here and applying word sense disambiguation techniques to the referents.