TR-IT-0049 :March 1994

Yves Lepage

Texts and Structures Pattern-matching and Distances

Abstract:Basic data structures of natural language processing, strings and trees, are generalised in a data structure called wood. On this data structure, pattern-matching is formalised as identification. A neutral element is built, which enables a formal interpretation of variables in patterns. Also, a distance on this data structure can be defined as an extension and a generalisation of well-known metrics on strings and trees. The links between pattern-matching and distance are presented as well as results in formal language theory.

Keywords: String, tree, wood, matching, identification, distance, metric.