Title

Applying a data mining query language to the discovery of interesting patterns in WEB Logs

Authors

Rosa Meo, Marco Botta and Roberto Esposito
Dipartimento di Informatica, Università di Torino, Italy

Maristella Matera
Dipartimento di Elettronica e Informazione, Politecnico di Milano, Italy

Abstract

Inductive databases, and more in general data mining applications for knowledge discovery and decision support, aim at discovering hidden patterns in the available data that are interesting for the user/analyst.

Inductive databases have been proposed by Mannila and Imielinski as a powerful tool, able to leverage the knowledge discovery process. This power comes from the flexibility and the expressive power of the query languages available in the inductive database framework. In their paper, they stated that with the aid of an inductive database, knowledge discovery was just a matter of the expressive power of the query languages.

In this paper, we show that this statement holds. We show the usage of an inductive database for a real case study: the analysis of logs data registering users activity on the WEB. The observed WEB site belongs to the Department of Electronics and Information Systems, at Politecnico di Milano. WEB logs were formatted in XML, and contained, for each page request, information on the user crawling session. Moreover, since WEB pages are dynamically generated, the logs maintained also identifiers of the information content units of the page.

We aimed at the discovery of patterns that span very different categories:

  1. recurrent web crawling paths or most frequently generated clickstreams,
  2. user profiles based on the usage of web resourses, such as the traffic generated over the network, or the hours in the day in which most visits to the web occur,
  3. page contents most frequently occurring in the visited pages, and
  4. anomalies detection with the purpose to discovery intrusion attempts or a dangerous usage of the resourses, and so on.

Moreover, we show that the discovery of a large spectrum of interesting patterns, is possible just with the usage of query languages, and results in a relatively easy task. We used SQL for data pre-processing and post-processing and a single, yet powerful data mining query language, MINE RULE, to extract the interesting patterns from the database. The discovered patterns in WEB logs analysis, a nowadays important application domain, are useful to web and system administrators and web applications designers.

Slides

PDF (1183391 bytes)

Last modified: $Date: 2004/04/05 11:59:51 $ (UTC)