Kybernetika 38 no. 1, 91-103, 2002

Implementation of directed acyclic word graph

Miroslav Balík

Abstract:

An effective implementation of a Directed Acyclic Word Graph ({\sl DAWG}) automaton is shown. A {\sl DAWG} for a text $T$ is a minimal automaton that accepts all substrings of a text $T$, so it represents a complete index of the text. While all usual implementations of {\sl DAWG} needed about 30 times larger storage space than was the size of the text, here we show an implementation that decreases this requirement down to four times the size of the text. The method uses a compression of {\sl DAWG} elements, i. e. vertices, edges and labels. The construction time of this implementation is linear with respect to the size of the text, a search for a specific pattern is done in a linear time with respect to the size of the pattern. This implementation preserves both good properties of the {\sl DAWG} automaton.