Kybernetika 40 no. 3, 275-292, 2004

Efficiency-conscious propositionalization for relational learning

Filip Železný

Abstract:

Systems aiming at discovering interesting knowledge in data, now commonly called data mining systems, are typically employed in finding patterns in a single relational table. Most of mainstream data mining tools are not applicable in the more challenging task of finding knowledge in structured data represented by a multi-relational database. Although a family of methods known as inductive logic programming have been developed to tackle that challenge by immediate means, the idea of adapting structured data into a simpler form digestible by the wealth of AVL systems has been always tempting to data miners. To this end, we present a method based on constructing first-order logic features that conducts this kind of conversion, also known as propositionalization. It incorporates some basic principles suggested in previous research and provides significant enhancements that lead to remarkable improvements in efficiency of the feature-construction process. We begin by motivating the propositionalization task with an illustrative example, review some previous approaches to propositionalization, and formalize the concept of a first-order feature elaborating mainly the points that influence the efficiency of the designed feature-construction algorithm.