![]() |
|
|
Prediction of Chemical CarcinogenicityStructure Activity Relationships (SARs) play an expanding role in estimating biological effects of chemicals. The main problem with their application for the prediction of toxic effects is, that it is very time consuming and expensive to perform experiments for the development of an SAR model. It is therefore advantageous to use already available experimental data, but these datasets contain results from experiments with compounds of very diverse (non-congeneric) structures. Traditional SAR methods (used e.g. in pharmaceutical research) rely on the presence of a common substructure and are therefore not suited for this type of problem. We develop and apply Data Mining methods based on Inductive Logic Programming (ILP) to extract SARs from the Carcinogenic Potency Database (CPDB). These programs are able to detect in a learning data set (e.g. the CPDB) relationships between chemical structures and carcinogenic properties and to use them for the prediction of untested compounds. The resulting models are interpretable by chemists and toxicologists which is not the case for other Machine Learning methods (e.g. Neural Networks). This means, that they can be used for predicting carcinogenicity, but also for the identification of structural features leading to carcinogenicity. They can be applied to predict toxic properties in a very early stage of product development and provide guidelines for the design and synthesis of safe chemicals. |