Sie sind hier: Startseite Daten Lehrstühle Datenbanken und Informationssysteme Automatic Web Data Extraction for Information Monitoring
Artikelaktionen

Automatic Web Data Extraction for Information Monitoring

Rife information content available on the World Wide Web is published within representation-oriented semi-structured HTML pages making it difficult for machines to access the content automatically. Therefore tools which are able to unfold the information contained in suchlike resources and transform them into machine-readable and understandable formats are required.

Rife information content available on the World Wide Web is published within representation-oriented semi-structured HTML pages making it difficult for machines to access the content automatically. Therefore tools which are able to unfold the information contained in suchlike resources and transform them into machine-readable and understandable formats are required.

As the vision of the Sematic Web seems to be far away from beeing accomplished, due to the lack of simple mechanisms, we propose a fully automatic wrapper prototype called ViPER (Visual perception-based extraction of records) which is able to extract repetitive structured data records contained in HTML pages with high precision and recall. ViPER tries to identifiy repetitive structures contained in the HTML source code and finally weights and separates the patterns according to the 2D-layout information according to the rendering information of the browser. After the extraction process the pattern with the highest weight becomes aligned in a table, handy to post-process the data. By mapping the data into a structured format, machines are finally able to process the relevant content automatically.

The ViPER system itself has been developed on top of JREX, which enables user to access Mozilla's XPCOM interface within Java. As ViPER is integrated into a meta search engine environment called ASTRO, we are currently working on a plugin-prototype of the system which enables a user to take advantage of the extraction power of ViPER during Web browsing within Mozilla or Firefox.

The ViPER-plugin for example enables a user to easily generate an agent which monitors a dynamic Web page and sends a notification as soon as the price of an item drops below a certain limit. Where the number of online stores monitored by the agent is not limited.


http://dbis.informatik.uni-freiburg.de/index.php?project=VIPER

Benutzerspezifische Werkzeuge