Institute for Computer Science

Machine Learning and Natural Language Processing Lab

PreviousNext

Master Thesis

Collective Classification using a Relational Naive Bayes Classifier

Tayfun Gürel, 2004


Traditional machine learning algorithms assume the existence of independent data instances. In this work, we present a special type of relational machine learning algorithm for classification without assuming the existence of independent data instances. Instead, we conceptualize the data as an undirected graph, nodes being the data instances and links representing the relations among them. It is assumed that the class labels of the individual instances directly depend on each other if they have a link between them (label-to-label dependency). The algorithm also makes use of the unlabeled data, in order to compute label-to-label dependencies among the data instances more accurately. It is based on Maximum Likelihood Parameter Learning and employs a combination of the hard Expectation Maximization (hard EM) and the naive bayes classifier. Our algorithm can be named as a Collective Classification (Taskar et al., 2002) approach. It aims at finding the best classifications for the whole data set collectively, by exploiting the label-tolabel dependencies.