TR-IT-0071 :1994.9.7

Christian Boitet

On the automatic transformation of a set of EBMT Constituent Boundary Patterns into a Context-Free Grammar, and associated bottom-up algorithms

Abstract:EBMT as pursued at ATR uses a set of “Constituent Boundary Patterns" to describe the input language. We show how to automatically convert such a set P into an equivalent CFG G=T(P), which is not only weakly (generatively) equivalent, but quasi-structurally equivalent: there is a very simple homomorphism transforming a G-tree into the corresponding P-tree. As a result, any classical CFG-based analysis algorithm may be used with G=T(P), and the parse trees converted to P-trees. In particular, all bottom-up algorithms are applicable. However, due to the layered character of the set of non-terminals of any such G, it is possible to propose a simple and efficient (non- deterministic) bottom-up analysis algorithm. Because special "constituent boundary" tags are introduced by a pre-processing phase into the strings described by P (and hence by G), it is also possible to adapt the idea of "operator precedence parsing" to G, and perhaps even directly to P.