We are designing new data mining techniques on gene expression data, more precisely inductive querying techniques that extract a priori interesting bi-sets, i.e., sets of objects (or biological situations) and associated sets of attributes (or genes). The so-called (formal) concepts are important special cases of a priori interesting bi-sets in derived boolean expression matrices, e.g., matrices that encode over-expression of genes. In order to provide putative transcription modules, i.e., one of the main goals for molecular biologists, several post-processing tasks can be performed on the extracted bi-sets.
In this talk, we will survey our recent work on constraint-based mining for bi-sets. It includes efficient techniques for closed set computation and thus concept mining in typical gene expression databases. A new algorithm that pushes monotonic constraints during concept extraction will be sketched. Finally, we will consider several post-processing techniques that are currently studied in cooperation with molecular biologists (S. Blachon, Dr. O. Gandrillon, Dr. S. Rome). It includes basic but efficient vizualization techniques and the use of strong association rules.