TR-IT-0333 :January,2000

Shuwu Zhang

Distance-related Unit Association Maximum Entropy (DUAME) Language Modeling

Abstract:In this report, we proposed a distance-related unit association maximum entropy (DUAME) language modeling. In comparison with conventional N-gram modeling, some major characteristics of DUAME modeling are: 1). Instead of longer N-gram, it can simulate an event (unit subsequence) using the exponential co-occurrence of full distance unit association (UA) features. Thus, it is functional comparable to higher order N-gram, 2). DUAME modeling can smooth the distribution of an partially unobserved event with the exponential co-occurrence of decreasing UA features. It is more accurate to predict this part of events compared to conventional backoff or interpolation smoothing in N-gram modeling, 3). Because all UA features in DUAME model are relevant to only two units, it takes much less memory requirement for storing feature parameters and it is more available in terms of memory to exploit longer distance language correlations compared to longer order N-gram features. Preliminary experimental results have shown that DUAME modeling is very useful for improving the current N-gram language modeling in speech recognition.