Title

Towards a Theory of Constraint-Based Itemset Mining

Authors

Johannes Gehrke

Abstract

Constraint-based itemset mining for questions such as ``find all frequent itemsets where the total price is at least $50'' is a very important data mining primitive. Over the last years, there has been much work on different classes of constraints and associated specialized algorithms that take as input constraints in one class and output all patterns that satisfy the constraints. Thus currently the field of constraint-based itemset mining is essentially a collection of specialized algorithms: Whenever a new type of constraint is proposed, a new specialized algorithm is developed to handle it, and these algorithms are highly tuned to take advantage of the unique properties of their associated constraints.

In this talk, I will first overview the state of the art in constraint-based itemset mining. Then I will talk about two first steps towards a complete theory of the field. First, I will introduce DualMiner, an algorithm that uses two types of constraints to prune its search space. Second, I will present a framework for constraint-based itemset mining that permits existing algorithms to extend to handle constraints for which they were not designed a-priori.

Slides

PDF (982215 bytes)

Last modified: $Date: 2004/04/05 11:47:10 $ (UTC)