Institute for Computer Science


Home
Events
People
Research
Publications
Teaching
Job and Students
   Opportunities
Tools and Data
Miscellanous
Contact

Prediction of Chemical Carcinogenicity

Structure Activity Relationships (SARs) play an expanding role in estimating biological effects of chemicals. The main problem with their application for the prediction of toxic effects is, that it is very time consuming and expensive to perform experiments for the development of an SAR model. It is therefore advantageous to use already available experimental data, but these datasets contain results from experiments with compounds of very diverse (non-congeneric) structures. Traditional SAR methods (used e.g. in pharmaceutical research) rely on the presence of a common substructure and are therefore not suited for this type of problem.

We develop and apply Data Mining methods based on Inductive Logic Programming (ILP) to extract SARs from the Carcinogenic Potency Database (CPDB). These programs are able to detect in a learning data set (e.g. the CPDB) relationships between chemical structures and carcinogenic properties and to use them for the prediction of untested compounds. The resulting models are interpretable by chemists and toxicologists which is not the case for other Machine Learning methods (e.g. Neural Networks). This means, that they can be used for predicting carcinogenicity, but also for the identification of structural features leading to carcinogenicity. They can be applied to predict toxic properties in a very early stage of product development and provide guidelines for the design and synthesis of safe chemicals.