Kybernetika 48 no. 3, 429-452, 2012

Tree compression pushdown automaton

Jan Janoušek, Bořivoj Melichar and Martin Poliak


A new kind of a deterministic pushdown automaton, called a \emph{Tree Compression Automaton}, is presented. The tree compression automaton represents a complete compressed index of a set of trees for subtrees and accepts all subtrees of given trees. The algorithm for constructing our pushdown automaton is incremental. For a single tree with $n$ nodes, the automaton has at most $n+1$ states, its transition function cardinality is at most $4n$ and there are $2n+1$ pushdown store symbols. If hashing is used for storing automaton's transitions, thus removing a factor of $log n$, the construction of the automaton takes linear time and space with respect to the length $n$ of the input tree(s). Our pushdown automaton construction can also be used for finding all subtree repeats without augmenting the overall complexity.


pushdown automata, trees, indexing trees, arbology, compression


05C05, 68Q68


  1. : Arbology www pages: Available on \url{}, March 2012.   CrossRef
  2. A. V. Aho and J. D. Ullman: The Theory of Parsing, Translation, and Compiling. Prentice-Hall Englewood Cliffs, N.J. 1972.   CrossRef
  3. H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov and P. E. Bourne: The protein data bank. Nucleic Acids Res. 28 (2000), 235-242.   CrossRef
  4. P. Bille: Pattern Matching in Trees and Strings. Ph.D. Thesis, FIT University of Copenhagen 2008.   CrossRef
  5. G. Busatto, M. Lohrey and S. Maneth: Grammar-based Tree Compression. Technical Report 2004.   CrossRef
  6. L. Cleophas: Tree Algorithms. Two Taxonomies and a Toolkit. Ph.D. Thesis, Technische Universiteit Eindhoven 2008.   CrossRef
  7. R. Cole, R. Hariharan and P. Indyk: Tree pattern matching and subset matching in deterministic $log^3$-time. SODA (1999), 245-254.   CrossRef
  8. H. Comon, M. Dauchet, R. Gilleron, C. Löding, F. Jacquemard, D. Lugiez, S. Tison and M. Tommasi: Tree automata techniques and applications. Available on: \url{} (2007).   CrossRef
  9. F. Gecseg and M. Steinby: Tree languages. In: Handbook of Formal Languages (G. Rozenberg and A. Salomaa, eds.), Vol. 3 Beyond Words. Springer-Verlag, Berlin 1997, pp. 1-68.   CrossRef
  10. T. Flouri, J. Janoušek and B. Melichar: Subtree and tree pattern pushdown automata for trees in prefix notation (2009). Comput. Sci. Inform. Systems 7 (2010), 2, 331-357.   CrossRef
  11. Ch. Hoffmann and M. O'Donnell: Pattern matching in trees. J. Assoc. Comput. Mach. 29 (1982), 1, 68-95.   CrossRef
  12. J. E. Hopcroft, R. Motwani and J. D. Ullman: Introduction to Automata Theory, languages, and Computation. Second edition. Addison-Wesley, Boston 2001.   CrossRef
  13. C. H. Wu, R. Apweiler, A. Bairoch, D. A. Natale, W. C. Barker, B. Boeckmann, S. Ferro, E. Gasteiger, H. Huang and R. Lopez et al: The Universal Protein Resource (UniProt) - An Expanding Universe of Protein Information. Database issue, Nucleic Acids Research, Oxford University Press, D187-D191 2006.   CrossRef
  14. A. R. Schmidt, F. Waas, M. L. Kersten, M. J. Carey, I. Manolescu and R. Busse: XMark - A benchmark for XML data management. In: Proc. International Conference on Very Large Data Bases (VLDB), Hong Kong 2002, pp. 974-985.   CrossRef
  15. L. Van Der Beek, G. Bouma, R. Malouf, G. Van Noord and Rijksuniversiteit Groningen: The Alpino dependency treebank. In: Computational Linguistics in the Netherlands (CLIN) 2002, pp. 1686-1691.   CrossRef
  16. T. Welch: A technique for high-performance data compression. Computer 17 (1984), 6, 8-19.   CrossRef
  17. J. Ziv and A. Lempel: A universal algorithm for sequential data compression. IEEE Trans. Inform. Theory 23 (1977), 3, 337-343.   CrossRef