Kybernetika 38 no. 1, 81-90, 2002

Dynamic programming for reduced NFAs for approximate string and sequence matching

Jan Holub

Abstract:

searching for all occurrences of a pattern (string or sequence) in some text, where the pattern can occur with some limited number of errors given by edit distance. Several methods were designed for the approximate string matching that simulate nondeterministic finite automata (NFA) constructed for this problem. This paper presents reduced NFA s for the approximate string matching usable in case, when we are interested only in occurrences having edit distance less than or equal to a given integer, but we are not interested in exact edit distance of each found occurrence. Then an algorithm based on the dynamic programming that simulates these reduced NFA s is presented. It is also presented how to use this algorithm for the approximate sequence matching.