![]() Institute for Computer Science |
Machine Learning and Natural Language Processing Lab |
||||||||||||||||||||
|
DissertationConstrained mining of patterns in large databases A theoretical framework is introduced to model data mining problems as the answering of queries in inductive databases. Inductive queries are requests to find out patterns in a database satisfying certain user-specified constraints. Through the analysis of the answer sets to inductive queries composed from anti-monotonic and monotonic basic predicates using Boolean operators, interesting properties, such as `dimension'', are found, which are useful for query optimization. The concept of version spaces has been extended to `generalized version spaces'' to encapsulate such answer sets. Generalized version spaces are closed under the usual set operations, thus providing the closure property akin to relation algebra. This generic theoretical framework has been applied to various application domains and various algorithms and optimization techniques have been devised to make use of the theoretical results to efficiently answer queries to inductive databases. Experiments show that these techniques are applicable. |