Kybernetika 36 no. 6, 605-616, 2000

Fuzzy querying: issues and perspectives

Janusz Kacprzyk, Gabriella Pasi, Peter Vojtáš and Sławomir Zadrożny

Abstract:

The term {\sl query} is widely used in the database as well as information retrieval communities. Basically, a query against a collection of information items (to be called later, for brevity, an information source) provides a formal description of the items of interest to the user posing this query. A source of information is meant here very generally. It may take the form of an archive of multimedia or textual documents, a database, or a knowledge base. In the three previous examples the information items are {\sl documents}, {\sl records} ({\sl rows} in relational data model) and {\sl facts}, respectively. In order to manage and access an information source, an appropriate system is defined which makes it possible to store, represent and retrieve information items by means of a formal {\sl query language}. Information systems that make it possible to manage information items previously mentioned are {\sl information retrieval systems}, {\sl data base management systems} and {\sl knowledge based systems}, respectively. Query languages of these systems usually refer to some features of entities represented by the items stored in an information source, e. g., keywords (index terms) in textual documents (documents archive), attributes (database) or arguments of facts (knowledge base). Thus, basically, a query may be seen as a set of selection conditions that should be met by an information item (its features) to be qualified as relevant with respect to the query. On the other hand, the {\sl query processing} itself may be seen as consisting mainly of {\sl matching} a query against the items of the information source. This process may be essentially more complex, as, e. g., in the case of knowledge bases where we deal with a whole chain of matching within the reasoning process. Often, a user faces the problem of how to express her or his information requirements in a formal query language supported by a given information system interface. These formal languages usually require a {\sl crisp} ({\sl precise}, {\sl unambiguous}) specification of a query, while, for human beings, a query is best expressed in terms of a natural language - a very powerful, but ambiguous and imprecise medium. Thus, adding some flexibility to traditional querying systems seems to be a critical issue for enhancing their effectiveness and efficiency. In this paper, we discuss some recent advances and basic issues related to flexible querying based on the application of fuzzy logic. We focus on two areas corresponding to the type of information source under consideration, namely: information retrieval in which we primarily deal with archives of textual documents and database querying. Both areas share the same interest in fuzzy (linguistic) queries and flexible matching against items of information. However, they have also their specific features, and these are pointed out in the next sections. The third area, that of very broadly meant knowledge bases querying is dealt with in the paper by Peter Vojtáš, in this special issue. Specifically, the concept of matching, essential for querying, may be identified to some extent with the {\sl unification}. In the mentioned paper, the issues related to the fuzzy unification are discussed. The matching of fuzzy concepts, from a slightly different perspective, is also the subject of the paper by Andrejková, in this issue. Another contribution relevant for the flexible querying of knowledge bases is the paper by Ch. Marsala, in this issue. Moreover, beside its application to querying itself, the concept of flexibility is usually extended to the representation of information to be queried. This is particularly evident in the area of information retrieval in which concepts of fuzzy logic fit very well into advanced indexing schemes for text documents. In case of database management systems, fuzzy logic based ideas have led to the development of imprecise/vague data representation models. These issues are also dealt with in the following sections. This paper is structured in two sections dealing with information retrieval and database querying, respectively. The paper is meant to provide a synthetic description of the research area of the papers appearing in this special issue of the Kybernetika. This issue is comprised of extended versions of selected papers presented at the session on fuzzy querying at the FSTA'2000 Conference held in Liptovský Mikuláš (Slovak Republic) in the winter of 2000. We refer to the other papers in this issue indicating their relevance for the topics discussed here.