Institute for Computer Science


Home
Events
People
Research
Publications
Teaching
Job and Students
   Opportunities
Tools and Data
Miscellanous
Contact

Web Mining

Prof. Dr. Luc De Raedt

Dipl.-Inf. Kristian Kersting

Seminar (2) Thursday, 14-16 o'clock, SR 079-10-019 (basement)


Next meeting: Thursday, February 01.,  2001     13:30 o'clock s.t.


The World-Wide Web has seen a period of enormous  growth and is still growing at a rapid pace. Retrieval of data and extraction of knowledge from the Web is considered one of the most challenging research problems in practical computer science. Researchers from the Artificial Intelligence (AI) and the Data Mining communities have realized that the Web is an area where AI and Data Mining techniques are really needed and where they have a great potential to solve some of the most important problems. A lot of research on these topics is currently going on world-wide and published every year at the major AI conferences. In this seminar, students have the opportunity to familiarize themselves with the application of Machine Learning and Data Mining techniques to the Web. 

Topics of papers presented in the seminar are: 

  • Text Classification 
  • Personal Information Agents, i.e., agents (autonomously "acting" and "sensing" computer programs) that gather information for the user. 
  • Web Mining, i.e., the extraction of knowledge from the Web. 
  • Intelligent Browsing, i.e., the extension of Web browsers by intelligent capabilities, such as making suggestions for links to be followed next, etc.
  • Web Search 
  • Collaborative Filtering, i.e., techniques that enable users to exploit similarities between interests and tastes for information filtering 
The first meeting for this seminar will be in the third week of October.

Schedule


Short presentations at 11. January 2001

Name
Topic
Goetz Sattler
WebWatcher
Marcin Nadolny
Data Mining
Ulrich Kuhn
Agents

Main talks at 11. Januar 2001

Kristian Kersting
Relational Learning with Statistical Predicate Invention

Main talks at 01. February 2001

Name
Topic
Goetz Sattler
Adaptive Web sites
Marcin Nadolny
WHIRL
Ulrich Kuhn
Internet Portals 

List of literature


  1. From Data Mining to Knowledge Discovery: An Overview. Usama M. Fayyad, Gregory Piatesky-Shapiro and Padhraic Smyth. In Usama M. Fayyad, Gregory Piatesky-Shapiro, Padhraic Smyth and Ramasamy Uthurusamu, editors,"Advances in Knowledge Discovery and Data Mining", pages 1-36, AAAI Press / The MIT Press.

  2. (Book avaible on request)
  3. The Process of Knowledge Discovery in Databases: A Human-Centered Approach. Ronald J. Brachman and Tej Anand. In Usama M. Fayyad, Gregory Piatesky-Shapiro, Padhraic Smyth and Ramasamy Uthurusamu, editors,"Advances in Knowledge Discovery and Data Mining", pages 37-58, AAAI Press / The MIT Press.

  4. (Book avaible on request)
  5. Data mining for hypertext: A tutorial survey. Soumen Charkrabarti. In SIGKDD Explorations Volume 1, Number 2, January 2000. 

  6. (pdf, ps)
  1. Wrapper induction: Efficiency and expressiveness. Nicholas Kushmerick. In Artificial Intelligence, Volume 118, Issues 1-2,pages 15-68, April 2000.

  2. (pdf)
  1. Learning to construct knowledge bases from the World Wide Web. Mark Craven, Dan DiPasquo, Dayne Freitag, Andrew McCallum, Tom Mitchell, Kamal Nigam and Seán Slattery. In Artificial Intelligence, Volume 118, Issues 1-2,pages 69-113, Aprill 2000.

  2. (pdf)
  3. Relational Learning with Statistical Predicate Invention: Better Models for Hypertext. Mark Craven and Sean Slattery. To appear in Machine Learning Journal.

  4. (printed article avaible on request)
  1. WHIRL: A word-based information representation language. William W. Cohen. In Artificial Intelligence, Volume 118, Issues 1-2, pages 163-196, April 2000.

  2. (pdf)
  1. Towards adaptive Web sites: Conceptual framework and case study. Mike Perkowitz and Oren Etsioni. In Artificial Intelligence, Volume 118, Issues 1-2, pages 245-275, April 2000.

  2. (pdf)
  1. WebWatcher: A Learning Apprentice for the World Wide Web.  R. Armstrong, D. Freitag , T. Joachims, T. Mitchell . In Working Notes of the AAAI Spring Symposium Series on Information Gathering from Distributed, Heterogeneous Environments, Stanford, 1995. 

  2. (ps.Z)
  3. WebWatcher: A Tour Guide for the World Wide Web. T. Joachims, D. Freitag, and T. Mitchell. In Proceedings of the 1997 IJCAI, August 1997.

  4. (ps.gz)
  1. Automating the Construction of Internet Portals with Machine Learning. Andrew McCallum, Kamal Nigam, Jason Rennie, Kristie Seymore. In Information Retrieval Journal, Volume 3, pages 127-163. Kluwer. 2000. 

  2. (ps.gz)
  3. Using Reinforcement Learning to Spider the Web Efficiently. Jason Rennie and Andrew K. McCallum. In Proceedings of ICML-1999 Workshop "Machine Learning in Text Data Analysis.

  4. (ps.gz)

Background:

  • Intelligent Internet systems. Alon Y.  Levy and Daniel S. Weld. In  Artificial Intelligence, Volume 118, Issues 1-2,pages 1-14. April 2000.

  • (pdf)
  • Web Mining Research:  A Survey.  R. Kosala and H. Blockheel.. In SIGKDD Explorations, Volume 2, Number 1, pages 1-15, 2000.

  • (pdf, ps)

Slides: