|
|
Seminar
Data Mining: The Practice
Dr. Andreas Karwath
- Time and Place:
- The seminar will be hald as two en-bloc seminars (distributed over most likely four afternoon sessions). We will have a first round of talks (each lasting not more than 15 minutes) probably on Wednesday, the 6th of June and the second round of talks in the afternoon of the 20th, 27th of June and on the 4th of July. Please note that these dates are subject to change!. The actual date will be arranged together with the
- Anouncements:
- We'll meet today (27.06.07) in building 52, room 02-017 at 13:00 SHARP!!
- Schedule for Meetings:
Colour scheme: grey=passed, pink=current next meeting, and yellow = future.
| Date |
Time |
Room |
Purpose |
Info |
| 20.04.07 |
16:15 |
101, 01-018 |
Initial Meeting
Setting the rules
|
Intro PDF |
| 06.06.07 |
14:00 - 18:00
|
101, 00 010/14 |
First round talks
|
|
| 27.06.07 |
13:00 - 18:00 (provisional) |
52, 02-017 |
Second round talks
|
|
| 31.08.07 |
24:00
|
None |
Deadline for reports
|
|
- Schedule Second Round Talks (on the 27.06.2007):
Please note that this is a provisional schedule and is subject to change. I'll try to video the talks and put them onto CD for your own, personal use!
| Start |
End |
Student |
Theme |
Extras |
| 13:00 |
13:45 |
D. Georgen |
Aggregate Features and AdaBoost for Music Classification (mus2) |
|
| 13:45 |
13:50 |
5 min break |
|
|
| 13:50 |
14:35 |
F. Meyer |
Comparing association rules and decision trees for disease prediction (med1) |
|
| 14:35 |
14:40 |
5 min break |
|
|
| 14:40 |
15:25 |
M. Grützner |
Induction of compact decision trees for personalized recommendation (eco2) |
|
| 15:25 |
15:30 |
5 min break |
|
|
| 15:30 |
16:15 |
D. Butt |
Predicting customer shopping lists from point-of-sale purchase data (eco1) |
|
| 16:15 |
16:20 |
5 min break |
|
|
| 16:20 |
17:05 |
I. Berger |
Active EM to reduce noise in activity recognition (act2) |
|
- Schedule First Round Talks (on the 06.06.2007):
Please note that this is a provisional schedule and is subject to change. I'll try to video the talks and put them onto CD for your own, personal use!
| Start |
End |
Student |
Theme |
Extras |
| 14:00 |
14:15 |
G. Lippert |
Collective Entity Resolution in Relational Data (net1) |
|
| 14:15 |
14:30 |
D. Georgen |
Aggregate Features and AdaBoost for Music Classification (mus2) |
|
| 14:30 |
14:45 |
B.A. Gorog |
Exploring Pianist Performance Styles with Evolutionary String Matching (mus1) |
|
| 14:45 |
14:55 |
Break |
|
|
| 14:55 |
15:10 |
M.B. Freitag |
Subsequence matching on structured time series data (med3) |
|
| 15:10 |
15:25 |
F. Meyer |
Comparing association rules and decision trees for disease prediction (med1) |
|
| 14:25 |
14:40 |
M. Grützner |
Induction of compact decision trees for personalized recommendation (eco2) |
|
| 15:40 |
15:50 |
Break |
|
|
| 15:50 |
16:05 |
D. Butt |
Predicting customer shopping lists from point-of-sale purchase data (eco1) |
|
| 15:05 |
16:20 |
W. Ali |
Sensing from the basement: a feasibility study of unobtrusive and low-cost home activity recognition (act1) |
|
| 16:20 |
16:35 |
I. Berger |
Active EM to reduce noise in activity recognition (act2) |
|
| 16:35 |
16:50 |
D. Selle |
I never sent an email to the lecturer (surprise us 1) |
|
- Overview page:
- The web page for an overview (language credit points, etc) of this course can be found here.
- Registration:
- Registration is done using the official registration server.
- Reports:
Reports will have to be handed in by latest 31.08.07
midnight. Please submit it in PDF format via email. You are
encouraged to use Latex, exeptionally you can use
Word. However, I still insist on PDF. Use the Springer Style
for Lecture Notes in Computer Science. For this, download one
of the two files: report.tgz (to extract:
tar xvfz report.tgz) or report.zip. After
extraction, you will have two subdirectories in directory
called report. One is for use of Latex and one for Word. I
have included a sample document called
reportBlank.{tex|doc}. In Latex you can use bibtex (the usual
line of commands is: latex bibtex latex latex dvipdf...). In
Word I guess you have to do most of the fomatting yourself and
need Adobe Distiller to produce PDFs (or other tools?)
The report should summarize the approach(es) you have presented
as well as have a look what else is around in that
area. Furthermore, you might want to include some personal
opinions about the work presented.
Overall, the report should have a maximum of 14 pages in Springer LNCS style,
including all figures, pictures, and references.
ATTENTION: You have to use the styles provided. If you change
the styles in any way (margins, font size, ...) I might not
mark the report or at least take maks away!
-
Location:
See meeting schedule (above)
- Schedule for the talks (Time and place to be announced here soon):
- Subjects:
- Musicology:
- ID:mus1
Madsen S.T., Widmer G.: Exploring Pianist Performance Styles with Evolutionary String Matching, International Journal on Artificial Intelligence Tools. World Scientific Publishing Company, 15(4), 495-514. (2006). PDF
Keywords: Piano playing, SOM, approximate string matching, evolutionary algorithms
- ID:mus2
Bergstra J., Casagrande N., Erhan D., Eck D., Kégl B.: Aggregate Features and AdaBoost for Music Classification. PDF
Keywords: genre classification, artist recognition, audio feature aggregation, AdaBoost
- ID:mus3
Tobudic A., Widmer G.: Relational IBL in Classical Music Machine Learning, 64:5-24 (2006) PDF
Keywords: relational instance based learning, learning to play music,
- Neuroscience:
- ID:neu1
Fan Y., Shen D., Davatzikos C.: Detecting Cognitive States from fMRI images by machine learning and multivariate classification MIUA 2006 PDF
Keywords: brain images, feature extraction SVM
- ID:neu2
Shenoy P., Rao R. :
Dynamic Bayes Networks for Brain-Computer Interfacing
NIPS 2005, 17 PDF
Keywords: dynamic bayes networks, brain-computer interface (BCI), SVM
- Activity/Intention Prediction:
- ID:act1
Fogarty J., Au C. , Hudson S. E. : Sensing from the basement: a feasibility study of unobtrusive and low-cost home activity recognition UIST '06, 91-100, 2006. PDF
Keywords: activity recognition, sensing in the home, sensor-based models, SVM, WEKA
- ID:act2
Shen J., Dietterich T.G.: Active EM to reduce noise in activity recognition IUI '07, 132-140, 2007. PDF
Keywords: active learning, expectation-maximization (EM), intelligent interface, machine learning, noise
- ID:act3
Beetz M., v. Hoyningen-Huene N., Bandouch J., Kirchlechner B., Gedikli S., Maldonado A.: Camera-based observation of football games for analyzing multi-agent activities AAMAS '06, 42-49, 2006. PDF
Keywords: analysis of intentional activity, motion interpretation, motion tracking, object tracking, state estimation, video analysis
- Economics/Retail:
- ID:eco1
Cumby C., Fano A., Ghani R., Krema M.: Predicting customer shopping lists from point-of-sale purchase data. KDD '04, 402-409, 2004. PDF
Keywords: retail data mining, classification, machine learning (variety of algorithms)
- ID:eco2
Nikovski D., Kulev V: Induction of compact decision trees for personalized recommendation SAC '06, 575-581, 2006 PDF
Keywords: frequent item-set mining, product recommendation, response modeling, decision trees
- Biology/Bioinformatics:
- ID:bio1
Huang C., Morcos F., Kanaan S.P., Wuchty S., Chen D.Z., Izaguirre J.A. :
Predicting Protein-Protein Interactions from Protein Domains Using a Set Cover Approach
IEEE/ACM Trans. Comput. Biol. Bioinformatics, 4(1), 78-87, 2007. PDF
Keywords: Computations on discrete structures, graph algorithms, bioinformatics (genome or protein) databases, biology, genetics.
- Medicine:
- ID:med1
Ordonez C.:
Comparing association rules and decision trees for disease prediction HIKM '06, 17-24, 2006. PDF
Keywords: association rule, decision tree, medical data
- ID:med2
Rao R.B., Krishnan S., Niculescu R.S:
Data mining for improved cardiac care
SIGKDD Explor. Newsl. 8(1), 1931-0145, 2006. PDF
Keywords: Medical information systems, probabilistic reasoning
- ID:med3
Wu H., Salzberg B, Sharp G.C., Jiang S.B., Shirato H., Kaeli D.:
Subsequence matching on structured time series data
SIGMOD '05, 682-693, 2005 PDF
Keywords: time series, clustering, tumor analysis
- Geography/Weather/Climate:
- ID:geo1
Basak J. Sudarshan A., Trivedi D., Santhanam M.S.:
Weather Data Mining Using Independent Component Analysis
J. Mach. Learn. Res., 5, 239-253, 2004PDF
Keywords: priciple component analysis (PCA), weather data, spacio-temporal pattern mining
Networks:
- ID:net1
Bhattacharya I., Getoor L. :
Collective Entity Resolution in Relational Data
TKDD Volume 1(1), 2007. PDF
Keywords: citation analysis, entity resolution
General information (subject to change):
- We will have a number of afternoon seminars lasting most
likely four afternoons. Each student will have to give a short
(max 15 minutes) and a long presentation lasting roughly 35
minutes + 10 minutes questions. The purpose of the first
round, is to give you feedback on how to improve your
presentation skills and what could be improved in presenting
the subject for the second round.
|