Institute for Computer Science


Home
Events
People
Research
Publications
Teaching
Job and Students
   Opportunities
Tools and Data
Miscellanous
Contact

Seminar "Data Mining in der Bioinformatik"

Prof. Dr. Luc De Raedt

Mitwirkung : Dr. Stefan Kramer , Dr. Christoph Helma, Dipl.-Inf. Kristian Kersting

(2 SWS)


Uhrzeit: n. V. - Ort: SR 00-019, Geb. 079
Vorbesprechung: Do., 26. April 2001, 15 - 17 Uhr


Die Analyse von biologischen Messergebnissen ist eine zentrale Aufgabe in der Bioinformatik. Das Ziel dabei ist, Muster und Regelmässigkeiten in Daten zu erkennen, die neue wissenschaftliche Erkenntnisse ermöglichen. Die Muster und Regelmässigkeiten können prädiktiv sein (wie z.B. bei Klassifikations-oder Regressionsproblemen) oder deskriptiv (wie z.B. bei Problemen, bei denen es "nur" um das Finden von Abhängigkeiten in Daten geht). Die (maschinelle) Entdeckung neuen Wissens ist auch das Thema des Gebiets des "Data Mining". Es existieren heute etliche "Data Mining" Techniken und Werkzeuge, die man dazu verwenden kann, verschiedene Arten von Muster und Regelmässigkeiten in grossen Datenbanken zu erkennen.

In diesem Seminar sollen die wichtigsten neueren Arbeiten auf diesen beiden Gebieten behandelt werden:
 

Einführung

L. Hunter: Molecular Biology for Computer Scientists. In L. Hunter (ed.), Artificial Intelligence and Molecular Biology, AAAI Press, 1993.

U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth: From Data Mining to Knowledge Discovery: An Overview. In U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamu (eds.), Advances in Knowledge Discovery and Data Mining, pp. 1-36, AAAI Press / The MIT, 1995.

S. Crotty, A. Basu, C. Onufryk, S. Finkelstein, D.R. Birgeneau, P. Sharp, A. Latham, E. Daida, A. Tahk, G. Tsan, V. Ingram, D.R. Silbey, R. Sauer: Large Molecules, Central Dogma, Prokaryotic Genetics and Gene Expression. Chapters 2, 6, 7 in  MIT's Biology Hypertextbook, 1995-2001. http://esg-www.mit.edu:8001/esgbio/7001main.html

Pairwise Alignment

R. Durbin, S. Eddy, A. Krogh, G. Mitchison: Pairwise Alignment, Chapter 2 in Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, Cambridge University Press, 1998.

Lernen aus Genexpressionsdaten

M.P.S. Brown,  W.N. Grundy, D. Lin, N. Cristianini, C.W. Sugnet, T.S. Furey, M Ares, D. Haussler: Knowledge-based Analysis of Microarray Gene Expression Data by using Support Vector Machines, PNAS 97(1):262--267, 2000.

J. Komorowski, T.R. Hvidsten, T.-K. Jenssen, D. Tjeldvoll, E. Hovig, A.K. Sanvik, A. Laegreid: Towards Knowledge Discovery from cDNA Microarray Gene Expression Data, in Proceedings of the Fourth European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD'2000), pp. 470-475, 2000.

M. Craven, D. Page, J. Shavlik, J. Bockhorst, J. Glasner: Using Multiple Levels of Learning and Diverse Evidence to Uncover Coordinately Controlled Genes, Proceedings of the Seventeenth International Conference on Machine Learning, pp. 199--206, 2000.

N. Friedman, M. Linial, I. Nachman, D. Pe'er: Using Bayesian Networks to Analyze Expression Data Journal of Computational Biology, 7, 2000.

Protein Structure Prediction

M. Turcotte, S.H. Muggleton, M.J.E. Sternberg: The Effect of Relational Background Knowledge on Learning of Protein Three-Dimensional Fold Signatures, Machine Learning, to appear, 2001.

T.R. Ioerger, L.A. Rendell, S. Subramaniam: Searching for Representations to Improve Protein Sequence Fold-Class Prediction, Machine Learning, 21(1-2):151--175, 1995.

N. Abe, H. Mamitsuka: Predicting Protein Secondary Structure Using Stochastic Tree Grammars, Machine Learning, 29(2/3): 275--301, 1997.

Entdeckung von Motifs

D. Conklin: Machine Discovery of Protein Motifs Machine Learning, 21(1-2): 125--150, 1995.

Y.-J. Hu, S. Sandmeyer, D. Kibler: Detecting Motifs from Sequences Machine Learning: Proceedings of the Sixteenth International Conference (ICML'99), pp. 181--190, 1999.

J.T.L. Wang, T.G. Marr, S. Rozen, D. Shasha, B.A. Shapiro, G.-W. Chirn, Z. Wang, K. Zhang: Pattern Discovery an Classification in Biosequences, in J. Wang, B.A. Shapiro, D. Shasha (eds.), Pattern Discovery in Biomolecular Data: Tools, Techniques, and Applications, Oxford University Press, 1999.

J. Glasgow, E. Steeg, S. Fortier: Motif Discovery in Protein Structure Databases in J. Wang, B.A. Shapiro, D. Shasha (eds.), Pattern Discovery in Biomolecular Data: Tools, Techniques, and Applications, Oxford University Press, 1999.

D.J. Cook, L.B. Holder, S. Su, R. Maglothin, I. Jonyer: Structural Mining of Molecular Biology Data, IEEE Engineering in Medicine and Biology, special issue on Advances in Genomics, to appear, 2001.

S. Su, D.J. Cook, L.B. Holder: Knowledge Discovery in Molecular Biology: Identifying Structural Regularities in Proteins, Intelligent Data Analysis, 3:413--436, 1999.

X. Wang, J.T.L. Wang, D. Shasha, B.A. Shapiro, S. Dikshitulu, I. Rigoutsos, K. Zhang: Automated Discovery of Active Motifs in Three Dimensional Molecules, in Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining (KDD 1997), pp. 89-95, 1997.

Protein Class Recognition

S. Muggleton, C.H. Bryant, A. Srinivasan: Learning Chomsky-like Grammars for Biological Sequence Families. Proceedings of the Seventeenth International Conference on Machine Learning, pp. 631--638, 2000.