International Workshop on Internet Data Management (IDM'99) ,
Sept. 2, 1999, Firenze, Italy. In Proc. DEXA 99 Workshop, IEEE Computer Society Press.

Modeling and Querying Structure and Contents of the Web

Wolfgang May

Abstract: For accessing and processing the information provided on the Web, there is a need for extarction, restructuring, and integration of semistructured data from autonomous, heterogeneous sources. In this paper, we regard the Web and its contents as a unit, represented in an object-oriented data model: the Web structure (inter-document level), given by its hyperlinks, the parse-trees of Web pages (intra-document level), and their contents. The model is complemented by a rule-based object-oriented language which is extended by Web access capabilities and allows for and navigation in the unified model. We show the practicability of our approach by using the FLORID system.

[The Paper (IEEE Digital Library)]

[Slides]