![]() Institute for Computer Science |
Machine Learning and Natural Language Processing Lab |
||||||||||||||||||||
|
Master ThesisConstraint-Based Data Mining In Constraint-Based Data Mining, the task is to find all patterns from a formal language that satisfy a specified constraint w.r.t. a given database. This thesis gives two novel algorithms that calculate this solution space. The first of these methods is completely randomized and applicable to any pattern language with a partial order. It constructs the borders of the version space "from within", i.e. by allowing an arbitrary sequence of patterns evaluations. The second algorithm is tailored for the domain of strings and builds on suffix-tries. We start again from a randomized version and then make an approach towards an optimal strategy. Both methods are evaluated empirically against existing algorithms. |