Title

From KDD scenario description to data mining qualitative benchmarks

Authors

Cyrille Masson and Jean-François Boulicaut
INSA Lyon, LIRIS CNRS FRE 2672
Batiment Blaise Pascal
69621 Villeurbanne cedex, France

Abstract

The inductive database framework assumes that complex knowledge discovery processes can be considered as querying processes. Querying inductive databases needs for primitives to: (1) select, manipulate and query data, (2) select, manipulate and query a priori interesting patterns (i.e., the so-called inductive queries which return the patterns that satisfy some constraints), and (3) cross over patterns and data (e.g., selecting the data in which some patterns hold).

In this talk, we consider the formal description of KDD scenarios using a simple inductive database query language. Such a formalization is indeed useful for (a) the transfer of KDD expertise and (b) the study of optimization schemes (e.g., compiling sequences of queries). We want to introduce an original motivation for such a formalization: the design of qualitative benchmarks for data mining system evaluations. After a characterization of scenarios that can be used as qualitative benchmarks, we will discuss a complex scenario related to gene expression data analysis. It involves two pattern domains (itemsets and sequential patterns) and give rise to typical sequence of complex inductive queries. It is then possible to illustrate how it can be used for an objective evaluation of data mining tools.

Slides

PDF(171320 bytes) / PPT (348160 bytes)

Last modified: $Date: 2004/04/19 15:52:10 $ (UTC)