|
|
Seminar "Data Mining in der Bioinformatik"
Prof. Dr. Luc De Raedt
Mitwirkung : Dr. Stefan Kramer , Dr. Christoph Helma, Dipl.-Inf. Kristian Kersting
(2 SWS)
Uhrzeit: n. V. - Ort: SR 00-019, Geb. 079
Die Analyse von biologischen Messergebnissen ist eine zentrale Aufgabe
in der Bioinformatik. Das Ziel dabei ist, Muster und Regelmässigkeiten
in Daten zu erkennen, die neue wissenschaftliche Erkenntnisse ermöglichen.
Die Muster und Regelmässigkeiten können prädiktiv sein (wie
z.B. bei Klassifikations-oder Regressionsproblemen) oder deskriptiv (wie
z.B. bei Problemen, bei denen es "nur" um das Finden von Abhängigkeiten
in Daten geht). Die (maschinelle) Entdeckung neuen Wissens ist auch das
Thema des Gebiets des "Data Mining". Es existieren heute etliche "Data
Mining" Techniken und Werkzeuge, die man dazu verwenden kann, verschiedene
Arten von Muster und Regelmässigkeiten in grossen Datenbanken zu erkennen.
In diesem Seminar sollen die wichtigsten neueren Arbeiten auf diesen
beiden Gebieten behandelt werden:
Einführung
L. Hunter: Molecular Biology for Computer Scientists. In L. Hunter (ed.),
Artificial Intelligence and Molecular Biology, AAAI Press, 1993.
U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth: From Data Mining to Knowledge
Discovery: An Overview. In U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth,
R. Uthurusamu (eds.), Advances in Knowledge Discovery and Data Mining,
pp. 1-36, AAAI Press / The MIT, 1995.
S. Crotty, A. Basu, C. Onufryk, S. Finkelstein, D.R. Birgeneau, P. Sharp,
A. Latham, E. Daida, A. Tahk, G. Tsan, V. Ingram, D.R. Silbey, R. Sauer:
Large Molecules, Central Dogma, Prokaryotic Genetics and Gene Expression.
Chapters 2, 6, 7 in MIT's Biology Hypertextbook, 1995-2001. http://esg-www.mit.edu:8001/esgbio/7001main.html
Pairwise Alignment
R. Durbin, S. Eddy, A. Krogh, G. Mitchison: Pairwise Alignment, Chapter
2 in Biological Sequence Analysis: Probabilistic Models of Proteins and
Nucleic Acids, Cambridge University Press, 1998.
Lernen aus Genexpressionsdaten
M.P.S. Brown, W.N. Grundy, D. Lin, N. Cristianini, C.W. Sugnet, T.S.
Furey, M Ares, D. Haussler: Knowledge-based Analysis of Microarray Gene
Expression Data by using Support Vector Machines, PNAS 97(1):262--267,
2000.
J. Komorowski, T.R. Hvidsten, T.-K. Jenssen, D. Tjeldvoll, E. Hovig,
A.K. Sanvik, A. Laegreid: Towards Knowledge Discovery from cDNA Microarray
Gene Expression Data, in Proceedings of the Fourth European Conference
on Principles and Practice of Knowledge Discovery in Databases (PKDD'2000),
pp. 470-475, 2000.
M. Craven, D. Page, J. Shavlik, J. Bockhorst, J. Glasner: Using Multiple
Levels of Learning and Diverse Evidence to Uncover Coordinately Controlled
Genes, Proceedings of the Seventeenth International Conference on Machine
Learning, pp. 199--206, 2000.
N. Friedman, M. Linial, I. Nachman, D. Pe'er: Using Bayesian Networks
to Analyze Expression Data Journal of Computational Biology, 7, 2000.
Protein Structure Prediction
M. Turcotte, S.H. Muggleton, M.J.E. Sternberg: The Effect of Relational
Background Knowledge on Learning of Protein Three-Dimensional Fold Signatures,
Machine Learning, to appear, 2001.
T.R. Ioerger, L.A. Rendell, S. Subramaniam: Searching for Representations
to Improve Protein Sequence Fold-Class Prediction, Machine Learning, 21(1-2):151--175,
1995.
N. Abe, H. Mamitsuka: Predicting Protein Secondary Structure Using Stochastic
Tree Grammars, Machine Learning, 29(2/3): 275--301, 1997.
Entdeckung von Motifs
D. Conklin: Machine Discovery of Protein Motifs Machine Learning, 21(1-2):
125--150, 1995.
Y.-J. Hu, S. Sandmeyer, D. Kibler: Detecting Motifs from Sequences Machine
Learning: Proceedings of the Sixteenth International Conference (ICML'99),
pp. 181--190, 1999.
J.T.L. Wang, T.G. Marr, S. Rozen, D. Shasha, B.A. Shapiro, G.-W. Chirn,
Z. Wang, K. Zhang: Pattern Discovery an Classification in Biosequences,
in J. Wang, B.A. Shapiro, D. Shasha (eds.), Pattern Discovery in Biomolecular
Data: Tools, Techniques, and Applications, Oxford University Press, 1999.
J. Glasgow, E. Steeg, S. Fortier: Motif Discovery in Protein Structure
Databases in J. Wang, B.A. Shapiro, D. Shasha (eds.), Pattern Discovery
in Biomolecular Data: Tools, Techniques, and Applications, Oxford University
Press, 1999.
D.J. Cook, L.B. Holder, S. Su, R. Maglothin, I. Jonyer: Structural Mining
of Molecular Biology Data, IEEE Engineering in Medicine and Biology, special
issue on Advances in Genomics, to appear, 2001.
S. Su, D.J. Cook, L.B. Holder: Knowledge Discovery in Molecular Biology:
Identifying Structural Regularities in Proteins, Intelligent Data Analysis,
3:413--436, 1999.
X. Wang, J.T.L. Wang, D. Shasha, B.A. Shapiro, S. Dikshitulu, I. Rigoutsos,
K. Zhang: Automated Discovery of Active Motifs in Three Dimensional Molecules,
in Proceedings of the 3rd International Conference on Knowledge Discovery
and Data Mining (KDD 1997), pp. 89-95, 1997.
Protein Class Recognition
S. Muggleton, C.H. Bryant, A. Srinivasan: Learning Chomsky-like Grammars
for Biological Sequence Families. Proceedings of the Seventeenth International
Conference on Machine Learning, pp. 631--638, 2000.
|