Institute for Computer Science

Seminar 

Data Mining: The Practice

Dr. Andreas Karwath


  • Time and Place:

    • The seminar will be hald as two en-bloc seminars (distributed over most likely four afternoon sessions). We will have a first round of talks (each lasting not more than 15 minutes) probably on Wednesday, the 6th of June and the second round of talks in the afternoon of the 20th, 27th of June and on the 4th of July. Please note that these dates are subject to change!. The actual date will be arranged together with the

  • Anouncements:

    • We'll meet today (27.06.07) in building 52, room 02-017 at 13:00 SHARP!!

  • Schedule for Meetings:

    Colour scheme: grey=passed, pink=current next meeting, and yellow = future.

    Date Time Room Purpose Info
    20.04.07 16:15 101, 01-018 Initial Meeting
    Setting the rules
    Intro PDF
    06.06.07 14:00 - 18:00
    101, 00 010/14 First round talks
    27.06.07 13:00 - 18:00
    (provisional)
    52, 02-017 Second round talks
    31.08.07 24:00
    None Deadline for reports


  • Schedule Second Round Talks (on the 27.06.2007):


    Please note that this is a provisional schedule and is subject to change. I'll try to video the talks and put them onto CD for your own, personal use!

    Start End Student Theme Extras
    13:00 13:45 D. Georgen Aggregate Features and AdaBoost for Music Classification (mus2)
    13:45 13:50 5 min break
    13:50 14:35 F. Meyer Comparing association rules and decision trees for disease prediction (med1)
    14:35 14:40 5 min break
    14:40 15:25 M. Grützner Induction of compact decision trees for personalized recommendation (eco2)
    15:25 15:30 5 min break
    15:30 16:15 D. Butt Predicting customer shopping lists from point-of-sale purchase data (eco1)
    16:15 16:20 5 min break
    16:20 17:05 I. Berger Active EM to reduce noise in activity recognition (act2)


  • Schedule First Round Talks (on the 06.06.2007):


    Please note that this is a provisional schedule and is subject to change. I'll try to video the talks and put them onto CD for your own, personal use!

    Start End Student Theme Extras
    14:00 14:15 G. Lippert Collective Entity Resolution in Relational Data (net1)
    14:15 14:30 D. Georgen Aggregate Features and AdaBoost for Music Classification (mus2)
    14:30 14:45 B.A. Gorog Exploring Pianist Performance Styles with Evolutionary String Matching (mus1)
    14:45 14:55 Break
    14:55 15:10 M.B. Freitag Subsequence matching on structured time series data (med3)
    15:10 15:25 F. Meyer Comparing association rules and decision trees for disease prediction (med1)
    14:25 14:40 M. Grützner Induction of compact decision trees for personalized recommendation (eco2)
    15:40 15:50 Break
    15:50 16:05 D. Butt Predicting customer shopping lists from point-of-sale purchase data (eco1)
    15:05 16:20 W. Ali Sensing from the basement: a feasibility study of unobtrusive and low-cost home activity recognition (act1)
    16:20 16:35 I. Berger Active EM to reduce noise in activity recognition (act2)
    16:35 16:50 D. Selle I never sent an email to the lecturer (surprise us 1)


  • Overview page:

    • The web page for an overview (language credit points, etc) of this course can be found here.

  • Registration:

    • Registration is done using the official registration server.

  • Reports:


    Reports will have to be handed in by latest 31.08.07 midnight. Please submit it in PDF format via email. You are encouraged to use Latex, exeptionally you can use Word. However, I still insist on PDF. Use the Springer Style for Lecture Notes in Computer Science. For this, download one of the two files: report.tgz (to extract: tar xvfz report.tgz) or report.zip. After extraction, you will have two subdirectories in directory called report. One is for use of Latex and one for Word. I have included a sample document called reportBlank.{tex|doc}. In Latex you can use bibtex (the usual line of commands is: latex bibtex latex latex dvipdf...). In Word I guess you have to do most of the fomatting yourself and need Adobe Distiller to produce PDFs (or other tools?)

    The report should summarize the approach(es) you have presented as well as have a look what else is around in that area. Furthermore, you might want to include some personal opinions about the work presented.

    Overall, the report should have a maximum of 14 pages in Springer LNCS style, including all figures, pictures, and references.

    ATTENTION: You have to use the styles provided. If you change the styles in any way (margins, font size, ...) I might not mark the report or at least take maks away!

  • Location:

    See meeting schedule (above)

  • Schedule for the talks (Time and place to be announced here soon):


  • Subjects:

    • Musicology:

      • ID:mus1
        Madsen S.T., Widmer G.:
        Exploring Pianist Performance Styles with Evolutionary String Matching,
        International Journal on Artificial Intelligence Tools. World Scientific Publishing Company, 15(4), 495-514. (2006). PDF
        Keywords: Piano playing, SOM, approximate string matching, evolutionary algorithms

      • ID:mus2
        Bergstra J., Casagrande N., Erhan D., Eck D., Kégl B.:
        Aggregate Features and AdaBoost for Music Classification.
        PDF
        Keywords: genre classification, artist recognition, audio feature aggregation, AdaBoost

      • ID:mus3
        Tobudic A., Widmer G.:
        Relational IBL in Classical Music
        Machine Learning, 64:5-24 (2006) PDF
        Keywords: relational instance based learning, learning to play music,

    • Neuroscience:

      • ID:neu1
        Fan Y., Shen D., Davatzikos C.:
        Detecting Cognitive States from fMRI images by machine learning and multivariate classification
        MIUA 2006 PDF
        Keywords: brain images, feature extraction SVM

      • ID:neu2
        Shenoy P., Rao R. :
        Dynamic Bayes Networks for Brain-Computer Interfacing
        NIPS 2005, 17 PDF
        Keywords: dynamic bayes networks, brain-computer interface (BCI), SVM

    • Activity/Intention Prediction:

      • ID:act1
        Fogarty J., Au C. , Hudson S. E. :
        Sensing from the basement: a feasibility study of unobtrusive and low-cost home activity recognition
        UIST '06, 91-100, 2006. PDF
        Keywords: activity recognition, sensing in the home, sensor-based models, SVM, WEKA

      • ID:act2
        Shen J., Dietterich T.G.:
        Active EM to reduce noise in activity recognition
        IUI '07, 132-140, 2007. PDF
        Keywords: active learning, expectation-maximization (EM), intelligent interface, machine learning, noise

      • ID:act3
        Beetz M., v. Hoyningen-Huene N., Bandouch J., Kirchlechner B., Gedikli S., Maldonado A.:
        Camera-based observation of football games for analyzing multi-agent activities
        AAMAS '06, 42-49, 2006. PDF
        Keywords: analysis of intentional activity, motion interpretation, motion tracking, object tracking, state estimation, video analysis

    • Economics/Retail:

      • ID:eco1
        Cumby C., Fano A., Ghani R., Krema M.:
        Predicting customer shopping lists from point-of-sale purchase data.
        KDD '04, 402-409, 2004. PDF
        Keywords: retail data mining, classification, machine learning (variety of algorithms)

      • ID:eco2
        Nikovski D., Kulev V:
        Induction of compact decision trees for personalized recommendation
        SAC '06, 575-581, 2006 PDF
        Keywords: frequent item-set mining, product recommendation, response modeling, decision trees

    • Biology/Bioinformatics:

      • ID:bio1
        Huang C., Morcos F., Kanaan S.P., Wuchty S., Chen D.Z., Izaguirre J.A. :
        Predicting Protein-Protein Interactions from Protein Domains Using a Set Cover Approach
        IEEE/ACM Trans. Comput. Biol. Bioinformatics, 4(1), 78-87, 2007. PDF
        Keywords: Computations on discrete structures, graph algorithms, bioinformatics (genome or protein) databases, biology, genetics.


    • Medicine:
      • ID:med1
        Ordonez C.:
        Comparing association rules and decision trees for disease prediction
        HIKM '06, 17-24, 2006. PDF
        Keywords: association rule, decision tree, medical data

      • ID:med2
        Rao R.B., Krishnan S., Niculescu R.S:
        Data mining for improved cardiac care
        SIGKDD Explor. Newsl. 8(1), 1931-0145, 2006. PDF
        Keywords: Medical information systems, probabilistic reasoning

      • ID:med3
        Wu H., Salzberg B, Sharp G.C., Jiang S.B., Shirato H., Kaeli D.:
        Subsequence matching on structured time series data
        SIGMOD '05, 682-693, 2005 PDF
        Keywords: time series, clustering, tumor analysis

    • Geography/Weather/Climate:

      • ID:geo1
        Basak J. Sudarshan A., Trivedi D., Santhanam M.S.:
        Weather Data Mining Using Independent Component Analysis
        J. Mach. Learn. Res., 5, 239-253, 2004PDF
        Keywords: priciple component analysis (PCA), weather data, spacio-temporal pattern mining


    • Networks:

      • ID:net1
        Bhattacharya I., Getoor L. :
        Collective Entity Resolution in Relational Data
        TKDD Volume 1(1), 2007. PDF
        Keywords: citation analysis, entity resolution





  • General information (subject to change):

    • We will have a number of afternoon seminars lasting most likely four afternoons. Each student will have to give a short (max 15 minutes) and a long presentation lasting roughly 35 minutes + 10 minutes questions. The purpose of the first round, is to give you feedback on how to improve your presentation skills and what could be improved in presenting the subject for the second round.