![]() Institute for Computer Science |
Machine Learning and Natural Language Processing Lab |
||||||||||||||||||||
|
Master ThesisLogistic Model Trees Tree induction methods and linear regression are popular techniques for supervised learning tasks, both for the prediction of discrete classes and numeric quantities. The two schemes have somewhat complementary properties: the simple linear models fit by regression exhibit high bias and low variance, while tree induction fits more complex models which results in lower bias but higher variance. For predicting numeric quantities, there has been work on combining these two schemes into `model trees', i.e. trees that contain linear regression functions at the leaves [Quinlan 1992]. In this paper, we present an algorithm that adapts this idea for classification problems. For solving classification tasks in statistics, the analogon to linear regression is linear logistic regression, so our method builds classification trees with linear logistic regression functions at the leaves. We describe a stagewise fitting process that allows to build the different logistic regression functions in the tree by incremental refinement using the recently proposed LogitBoost algorithm [Friedman et al., 2000], and we show how this approach can be used to automatically select the most relevant attributes to be included in the logistic models. We compare our algorithm to several other state-of-the-art learning schemes on 32 benchmark UCI datasets, and conclude that it produces accurate classifiers and good estimates of the class membership probabilities. Friedman, J., Hastie, T. and Tibshirani, R. [2000]. Additive logistic regression: a statistical view of boosting. The Annals of Statistic, 38(2), 337-374. Quinlan, J.R. [1992]. Learning with continuous classes. In 5th Australian Joint Conference on Artificial Intelligence (pp. 343-348). |