methodological focus,
technical focus)
| Master/Diploma | Bachelor/Studienarbeiten |
| available |
Hierarchical SVMs in Recommender SystemsRecommender systems can be viewed as classification problem that is addressed, e.g., by SVMs. One of the main problems for this approach results from the large number of classes (typically in 1000s) and the fact that 1-vs-1-model setups have been found to be superior to 1-vs-all setup in this domain. To keep the number of models that have to be learned manageable low, not all 1-vs-1 contrasts, but just restricted subsets can be computed. Here, a domain taxonomy can be used.
Starting from an existing implementation of a simple collaborative filtering RS and some datasets (MovieLens, EachMovie), hierarchical SVM-models should be learned. In experiments, the effect of a hand-crafted domain taxonomy should be compared with that of a random taxonomy as well as of a taxonomy induced by hierarchical clustering.
| available |
Survey and Empirical Analysis on Attribute-aware CF Recommender SystemsMany Ecommerce sites are using recommender systems to help their customers to find suitable products from a large database. A recommender system (RS) learns from users and recommends products based on the user's tastes. Most of RS uses Collaborative Filtering (CF) or Content-based Filtering (CBF) techniques.
While CF recommends items to a user based on other users with similar preferences, CBF recommends items by comparing content/attributes of items from the user's history. Both methods have their advantages as well as shortcomings, thus incorporating components from both methods should improve their shortcomings. A variety of these so-called hybrid/attribute-aware CF techniques have been proposed. Each algorithm claims to outperform others and some might even share similar idea with different recommendation tasks or evaluated on various non-public datasets. However, there are usually no overview of through investigations on these algorithms evaluated with common datasets and settings.
The task of this topic is to survey existing attribute-aware CF RS algorithms and to implement these algorithms. Empirical analysis and experiments on these algorithms should be done with common public datasets and evaluation metrics such that comparison can be done in a common and fair environment. In addition, the behaviors of these algorithms should be observed when various item attributes/information are presented.
| available |
Graph triangulation for Bayesian Network InferenceAn undirected graph is called triangulated if all its cycles have a chord, i.e., there is an additional edge between two vertices of the cycle (only cycles of length > 3 being considered). Obviously, you can add edges to any graph until it gets triangulated (as the complete graph is triangulated). Triangulations of graphs are useful for example to compute a propagation structure for inference in bayesian networks. Generally, the less edges have to be added, the more useful a triangulation is. Finding an optimal graph triangulation is NP-hard.
The task of this topic is to implement some simple graph triangulation algorithms as well as two of the more complicated algorithms recently proposed, and evaluate their performance (quality of the triangulation found vs. runtime) on random graphs as well as bayesian network graphs.
Learning Bayesian Network StructuresLearning the structure of Bayesian Networks from data is one of the core tasks in Bayesian Networks that has been addressed by several algorithms recently. Starting from the CGNM/BN implementation in Java, that supports basic tasks as IO, inferencing and parameter learning, first some simple and fast algorithms like K2 should be implemented. Building on that, two more complex algorithms (PC and GES) should be implemented efficienlty. The performance of these algorithms should be evaluated in terms of quality of the solution found as well as runtime on real life datasets. Finally, synthetic datasets created from synthetic BNs should be used to assess the algorithms under controllable conditions.
Mixture Models for Wafer Failure AnalysisDuring production of semiconductor chips, silicon wafers consisting of 100-1000 parts are processed in many different steps. Only after the chips have been finished completely, their functionality can be tested. To assure a high quality of the final product, up to several hundred tests are conducted per chip. As chips are discarded alredy due to a single failing test, a test sequence is stopped upon encounter of the first failure. The vector of all test results is called failure vector or fingerprint.
In real production environments, detractors can cause failures. Usually there is not just a single cause responsible for such failures, but typical mixtures of different causes.
The task of this diploma thesis topic is to adapt mixture models based on Bayesian Networks to the problem, implement a suitable learning algorithm and run empirical experiments. Different types of data should be analyzed:
This diploma thesis topic is offerend jointly with Infineon technologies, Regensburg.
Integrating Bayesian Networks and Collaborative Filtering for Recommender SystemsOne of the recent most successfull models for recommender systems is a simple Bayesian Network consisting of only four nodes for user ID, item ID, rating and a hidden class node, the so-called "aspect model" by Hofmann 2004. While the aspect model gives high-quality recommendation, the learning process based on the expectation-maximization (EM) algorithm is slow. Traditionally, simple nearest neighbor models called collaborative filtering have been used for this task. Compared to the aspect model, these models are fast. To improve prediction accurracy, probabilistic ideas have been integreated into collaborative filtering techniques.
The task of this topic is to work the other way around and use initialization schemes based on collaborative filtering techiques to accelerate the Bayesian Network learning. Furthermore, the results should be compared with plain collaborative filtering, with the aspect models as well as with probabilistic collaborative filtering.
An editor generator for instances of XML SchemaWriting a schema valid XML document makes several demands on the author. He has to know about wellformed XML on the one hand as well as about the standard of XML schema. Last but not least he has to be aware of the specific schema that is to be implemented/fulfilled.
There already are schema aware text editors helping the author with most of the listed issues. However in this thesis an editor generator shall be designed and implemented. The generator ought to produce a specific editor per schema.
The generated editor ought to provide a form based way to edit schema valid XML documents. In addition the editor is intended to be capable of validating constraints that exceed the expressive capability of XML schema.
This topic is offered in cooperation with Vector Consulting GmbH, Stuttgart.
Collaborative Personal Ontology EvolutionOntologies specify the knowledge about a domain of interest using formal semantics, for example the products offered by an online-shop as well as a taxonomy of product categories organizing the products for better browsing and searching or an information portal like a digital library organizing books in a hierarchical category system.
Personalized semantic applications allow users to have a personal copy of the ontology and to tailor it to their needs: e.g., they can select to see only a subset of all the categories available, merge existing categories, or introduce completely new categories. In large information portals one can try to support users in maintaining their personal ontology by recommending them changes based on ontologies of other users. For example, if a user has many articles about "Java" and "C++" in his personal bibliography but all of them in a common category "programming languages", but other users have a better organization using two subcategories for "Java" and "C++", respectively, one would like to recommend the user to add these two subcategories and assign the papers accordingly.
Methods from Machine Learning, especially from Recommender Systems and Collaborative Filtering can be applied to learn such recommendations.
The task of this topic is, to implement different strategies for recommending categories, their super- und subcategories and the assignment of products to the categories: very simple strategies that take into account other users' ontologies only in a summary way, e.g., always recommend the most often used concept first, as well as personalized recommenders based on collaborative filtering methods. The strategies should be evaluated on several synthetic datasets.
Product Identification and ClusteringAutomatically structuring offers is a key task in e-commerce. Especially if offers are collected from different shops, two problems arise: - identifying products in different offers (e.g., the same product offered in different shops). - grouping similar products in categories based on some meta-data about the products that eventually is extracted automatically from HTML pages and thus might be dirty. Although hand-crafted identifications and groupings usually are of very good quality, this approach is too expensive and for product identifications too slow.
The task of this topic is to use information extraction methods for extracting suitable features from the textual product metadata and to cluster the products using a heuristic similarity measure using string distances (Levenshtein/edit distance) and term weighting schemes (tf.idf). Models for two or three different domains should be build with these methods for real-life datasets. As these heuristics are expected to be domain-specific, in a second step, the similarity measure should be automatically adapted based on explicit user feedback that eventually is collected via active learning.
This topic is offered jointly with Mentasys GmbH, Karlsruhe, who provide data and domain expertise.
Ensembles of relational and textbased models
with applications to the classification of scientific publicationsIn traditional text classification only attributes (like words included in title or abstract) of the document itself have been considered. But many documents are related to other documents by their metadata, for instance by references, same authors, conferences or journals.
The goal of this topic is to use these relationships for classification and to analyse whether it improves classification accuracy. Therefore several methods from the area of probabilistic relational learning should be applied to three bibliographic datasets. Furthermore it should be investigated whether a combination of relational classification and traditional text classification can improve classification accuracy.
A collaborative, bibliographic WikiCollaboration over the internet constantly is getting more important in research and industry. Many different application patterns like Wikis emerged to support the ad-hoc style often encountered in such collaborations. While Wikis are perfectly suited for free-style, loosely linked texts, the management of more structured information as bibliographic collections is not well supported (but see e.g., wikindx for an existing bibliographic Wiki).
Starting from a review of existing collaborative, web-based bibliographic tools (see e.g., Resource list of OpenOffice for such tools) and their main features, a design for such a tool based on a wiki platform should be developed and implemented on top of XWiki (or any other suitable, Java-based Wiki-platform). A special focus of this implementation should be (1) the management of access rights to individual records or sensitive annotations, (2) the ability to annotate records with rating information, and (3) the ability to keep a personal bibliography per user organized in a user-defined hierarchy as well as branches shared with other users.
Learning models for ACM classificationComputer science literature, i.e., books and articles, are classified (manually) according to the ACM classification (e.g., "I.2" is artificial intelligence; ACM = Association for Computing Machinery, one of the big international computer science societies).
The goal of this topic is to learn a model, e.g., a bayesian network, that tries to classify a paper by its metadata, i.e., titel, author, journal or conference, year, etc. This involves methods from text mining / information retrieval, (e.g., variables may be constructed from title keywords ), as well as some advanced data mining methods (e.g., dimensionality reduction as typically a huge number of variables is involved). A database with 200.000 hand-labeled training examples is available.
Attribute-aware Volatile Recommender SystemsMany recommender systems view products as "atomic entities" without any attributes. On the other hand, in most application scenarios (including all e-commerce scenarios) attributes for products are well known. Not using these attributes for the computation of recommendations seems to be a waste of the most valuable information. Volatile recommender systems do not identify users, but are task-driven (see e.g., karstadt).
The task of this topic is to design and implement a framework for the evaluation of volatile recommender systems for products with attributes. By means of an interface to data mining software, different modelling setups and learning algorithms and models should be compared on a real-life dataset.
Attribute-aware Personalized Recommender SystemsMany recommender systems view products as "atomic entities" without any attributes. On the other hand, in most application scenarios (including all e-commerce scenarios) attributes for products are well known. Not using these attributes for the computation of recommendations seems to be a waste of the most valuable information. Personalized recommender systems identify users and thus should learn user-individual preferences (see e.g., Amazon; you will have to register to use the system).
The task of this topic is to enrich a real-life dataset with product-features by wrapping information from an information portal and to design and implement a framework for the evaluation of personalized recommender systems for products with attributes. By means of an interface to data mining software, different modelling setups and learning algorithms and models should be compared.
Semantic Peer to Peer Recommender SystemsAs recommender systems aim at making available experiences from other users, they typically make use of a centralized information pool, e.g., to compute neighborhoods in collaborative filtering. In a peer to peer scenarios such a central knowledge repository is not available, but only information from peers. Caching strategies or dynamic peer selection have to be used to compute useful recommendations from local information.
The goal of this topic is to implement a simulation framework for semantic peer to peer recommender systems using a simple nearest neighbor based recommendation algorithm running locally on a peer (i.e., having access only to its peers). The domain (e.g., books, music) should be modelled by a domain ontology. Experiments should be run to assess different strategies for caching and peer selection.
Design and Prototypical Implementation of a Platform for Personalized Recommender SystemsA platform for personalized recommender systems has to provide access to an online information system and sytematically track users actions and derive preference indicators, e.g., which products does he look at, how long does he stay with a product, which products does he buy, etc. The platform should be able to use arbitrary recommender system models via an interface. Furthermore it should address data management, e.g., allow users to edit and correct preference indicators that have been extracted automatically.
There are several tasks to solve for this topic: First, there has to be conducted an analysis of existing platforms for personalized recommender systems found in the internet (at Amazon and many other shops). Second, requirements for a specific application scenario have to be fixed. Third, a generic database structure for such a system has to be developed. Fourth, a generic prototypical system has to be implemented and set up in a specific application context. Optionally, the thesis may contain some first observations on how users use the platform (preliminary descriptive usage analysis).
A Generic Data Warehouse Model for Recommender SystemsThis topic consists of a theoretical and a practical part.
In the theoretical part the state-of-the-art of modelling multidimensional schemas for data warehouses should be researched. The focus here is on conceptual models; implementation models and schema maps (as star and snowflake schema) should not be covered. Results should contain 1) a short description and a structuring of the different approaches proposed, 2) a description of the main differences of the different approaches regarding schema constructs, representation of constructs, general expressiveness of the approaches, and handling of typical modelling problems in DWH, 3) in-depth description of two of the most-promising approaches.
In the practical part a case study should be conducted, building a generic data warehouse model for recommender systems. The main focus here is on the specification of a suitable and flexible model in one of the modelling languages handled in-depth in the first part. At a minimum, core data of anonymous and volatile recommender systems as task profiles, product data, recommendation lists, and preference indicators, and according micro-conversion rates should be modelled.
As indicator of excellence the data warehouse model could be implemented in a prototype that prooves the feasability of the approach. Real-life data for such an experiment is available.