Академический Документы
Профессиональный Документы
Культура Документы
Review
Joint Laboratory of Biotechnology and Enzyme Catalysis, Institute of Catalysis and Surface Chemistry,
Polish Academy of Science, Niezapominajek 8, PL 30-239 Krakw, Poland
Abstract:
Background: Artificial Neural Networks (ANNs) are introduced as robust and versatile tools in quantitative structure-activity rela-
tionship (QSAR) modeling. Their application to the modeling of enzyme reactivity is discussed, along with methodological issues.
Methods of input variable selection, optimization of network internal structure, data set division and model validation are discussed.
The application of ANNs in the modeling of enzyme activity over the last 20 years is briefly recounted.
Methods: The discussed methodology is exemplified by the case of ethylbenzene dehydrogenase (EBDH). Intelligent Problem
Solver and genetic algorithms are applied for input vector selection, whereas k-means clustering is used to partition the data into
training and test cases.
Results: The obtained models exhibit high correlation between the predicted and experimental values (R2 > 0.9). Sensitivity analy-
ses and study of the response curves are used as tools for the physicochemical interpretation of the models in terms of the EBDH
reaction mechanism.
Conclusions: Neural networks are shown to be a versatile tool for the construction of robust QSAR models that can be applied to
a range of aspects important in drug design and the prediction of biological activity.
Key words:
artificial neural network, multi-layer perceptron, genetic algorithm, ethylbenzene dehydrogenase, biological activity
propriate functions to be utilized by the processing All of these advantages are so powerful that one
elements is typically conducted with automatic or can wonder why neural networks have not superseded
heuristic methods or proposed by the authors of the traditional linear approaches in structure-activity re-
ANN modeling application and is not typically of search. The answer is very simple artificial neural
great concern to the ANN user. networks have an important drawback that limits their
ANNs learn using examples that are presented to application. In most cases, the robustness of neural
the networks in iterations (called epochs) and mini- models comes at a cost the algorithms are too com-
mize the error of their predictions. In contrast to many plex to be analyzed by their users. As a result, there is
non-linear optimization protocols, this process does no straightforward method for the physicochemical
not require any a priori parameters to reach a mini- interpretation of such models. Thus, there is a simple
mum, as the initial parameters (weights) are assigned tradeoff between prediction quality and interpretabil-
randomly. Advanced training algorithms are very effi- ity. As neural networks are better prepared to model
cient in the minimization of the prediction error and complex nature, they are much harder to understand
are able to circumvent the problem of local minima. than simple linear models. Moreover, as complex prob-
That efficiency does not imply, however, that the prob- lems can be frequently solved in more than one way,
lem is solved automatically, regardless of the structure it is very common to obtain a range of good ANN
and characteristics of the chosen neural model. The models that utilize different descriptors. As a result,
selection of appropriate training algorithms and their there is no ultimate answer for a given problem, which
performance parameters (such as, for example, the yields additional uncertainty in the interpretation.
learning rate h used in back propagation, quick propa- Nevertheless, as ANNs are models that offer theo-
gation and delta-bar-delta, learning momentum, or retical representations of the studied problem, in prin-
Gaussian noise) frequently has an important influence ciple it is possible to study them and gain insight into
on a models training time and its final quality (see the described process.
Training section).
ANNs comprise informatics neurons that collect
inputs and process them with non-linear functions.
This procedure automatically eliminates the need for
Structure of neural networks
the introduction of cross-terms or quadratic terms [65].
The mutual interactions, if necessary, will emerge in
the network structure. Moreover, non-linearity is es- There are numerous different types of ANNs. A descrip-
sentially a structural ANN feature, which allows more tion of these types is beyond the scope of this paper
complex modeling (i.e., the creation of a much more and therefore, if readers are interested in this specific
complex response surface that is possible in the case subject, they should refer to an appropriate textbook
of linear models). on neural science [8, 12, 41]. As the feed-forward
ANNs exhibit great versatility in the way the de- Multi-Layer Perceptrons (MLPs) are most frequently
pendent variable is predicted, i.e., they can work in ei- utilized in QSAR research, they are also used as an
ther a regression or a classification regime. In the former example in this paper. Their short description is pro-
case, a continuous dependent variable is predicted, vided below.
whereas in the latter, it is provided in the form of non- MLPs are built from artificial neurons (Fig. 1), fre-
continuous categorical classifiers (for example 0 non- quently referred to as processing elements (PEs).
active, 1 active, 2 very active). Moreover, as the Each neuron collects signals from other neurons
non-linearity is built in, the relationship of the predictors (through artificial synapses) and assigns a different
to the predicted dependent variable does not have to be importance to each signal. This assignment is typi-
continuous. For example, it is possible to model enzyme cally achieved by the linear combination of each sig-
substrate conversion rates together with inhibitors, al- nal xi multiplied by an adjustable weight constant wi
though there is no common scale for both groups. plus a constant q used as a neuronal activity threshold
ANNs typically exhibit higher generalization ca- (the aggregation or transfer function z ).
pabilities than linear models. As such, they are pre- The result of the aggregation function z is subjected
destined to obtain global relationships, in contrast to to further mathematical processing by the so-called ac-
the local ones characteristic for the linear models. tivation function s( z), which is responsible for the in-
Training
troduction of non-linearity into the model. Although
there is a range of different types of activation func-
ANN training methods can be divided into two general
tions, the most frequently used are logistic or continu-
groups: supervised and unsupervised. In both cases, the
ous log-sigmoid functions that allow the smooth ad-
ANN is mapping input data into its structure, but dur-
justment of the neuron activation level to the given set
ing supervised learning, a dependent variable is also
of inputs. The result of the activation function y is
provided. As a result, the ANN is trained to associate
further transmitted along an artificial connection
characteristics of the input vectors with a particular
(axon) to become an input signal for other PEs.
value or class (dependent variable). In contrast, in the
As mentioned above, the most common neural ar-
case of unsupervised learning, no dependent variable is
chitecture utilized in QSAR research is feed-forward
provided, and the resulting map of cases is based on the
MLP (Fig. 2). In such a network, the neurons are
intrinsic similarities of the input data. The most com-
aligned in three (rarely more) layers, and signal is
mon unsupervised networks are Kohonen networks
transferred in only one direction, from the input neu-
(Self Organizing Maps) [31], which perform mapping
rons toward the output neurons. In the input layer, the
of an N-dimensional input space onto a 2D map of Ko-
PEs are responsible for the introduction of descriptors
honen neuron. Such an approach is excellent for the in-
vestigation of complex data structures.
In structure-activity relationship (SAR) research,
ANNs with supervised learning are most frequently en-
countered. Therefore, the description of this training
method is the most relevant for the topic of that paper.
The mean difference between the predicted values
of the dependent variable and the experimental values
determines the error of prediction. The main aim of
the training algorithm is to decrease this error. This
reduction is achieved by so-called back-propagation
algorithms that are based on the value and gradient of
the error. These algorithms adjust the weights in the
networks PEs in a previous layer (thus propagating
an error signal back) in a step-by-step manner, aim-
ing at the minimization of the final error. As a result,
the ANN learns by examples what prediction should
be made for a particular set of inputs and adjusts its
mathematical composition in such a way as to be the
most efficient. There are a number of different algo-
Fig. 2. A schematic representation of an MLP neural network. Trian- rithms available, but the back propagation algorithm
gles input neurons, gray rectangles hidden neurons, white rectan-
gle output neuron, circle input and output numerical variables.
was the first one. Its discovery was a cornerstone of
MLP 11-6-1 11 input neurons, 6 hidden neurons, 1 output neuron the application of ANNs in all fields of science [12].
The price paid for the greater (non-linear) model- results in incremental convergence along the search-
ing power of neural networks is that, although it is ing direction [69]. To attain more efficient ANN train-
possible to adjust a network to lower its error, one can ing, additional tricks are sometimes applied, such as
never be sure that the error could not be still lower. In momentum (which maintains the gradient-driven di-
terms of optimization, this limitation means that one rection of optimization and allows passing over local
can determine the minimum of the network error but minima) or Gaussian noise (added to the training in-
cannot know whether it is the absolute (global) mini- put values, also aimed at jumping out of the local min-
mum or simply a local one. ima). Eventually, the algorithm stops in a low point,
To determine minimum error, the concept of the er- which hopefully is the global minimum.
ror surface is applied. Each of the N weights and bi- Of course, ANNs are prone to the same over-fitting
ases of the network (i.e., the free parameters of the problems as any other model. The origin of the prob-
model) is taken to be a dimension in space. The N + 1 lem is the so-called over-training of the network. One
dimension is the network error. For any possible con- of the possible descriptions of over-training is that the
figuration of weights, the error for all of the training network learns the gross structure first and then the
data can be calculated and next can be plotted in the fine structure that is generated by noise [65] or that it
N + 1 dimension space, forming an error hypersur- accommodates too much to the structure of the learn-
face. The objective of network training is to find the ing data. The problem is handled by splitting the data-
lowest point in this hypersurface. Although the litera- base into training and test sets [60]. The error of the
ture describing the learning methods depicts error sur- test set is monitored during the ANN optimization
faces as smooth and regular, the neural network error procedure but is not used for ANN training. If the er-
surfaces for real tasks are actually very complex and ror of the test set starts to rise while the error of the
are characterized by a number of unhelpful features, training set is still decreasing, the training of the ANN
such as local minima, flat spots and plateaus, saddle- is halted to avoid over-training (which can be com-
points and long, narrow ravines. As it is not possible pared with learning by heart by the student).
to analytically determine where the global minimum As, to a certain extent, the test set indirectly deter-
of the error surface is, the training of neural network mines the training process (the decision when to
is essentially an exploration of the error surface. From stop), it is also advisable to design yet another valida-
an initially random configuration of weights and tion subset, which is used as an ultimate robustness
thresholds, the training algorithms incrementally cross-check. If the error obtained for the validation
search for the global minimum. Typically, the gradi- and test sets is in the similar range, one can be fairly
ent of the error surface is calculated at the current sure that the model is not over-fitted and retains high
point and used to make a downhill move. The extent generalization powers. Moreover, although the error
of the weight correction is controlled by the learning of the training set is, in most cases, lower than those
rate parameter, h (also called the learning constant). of the test and validation sets, it should also be in
The value of h has an influence on the rate of the a similar range. If the training error is many orders of
model training for learning rates that are too small, magnitude lower than that obtained for validation and
the final (optimal) set of weights will be approached test sets, it is a sure indication of over-fitting. In other
at a very slow pace. In contrast, values of the parame- words, the neural model is too well accommodated to
ter that are too high may result in jumping over the describing the training cases and likely will not have
best descent to the global minimum of the error sur- high prediction and generalization capabilities. In
face, which may result in the instability of the training such a case, the training of the model should be re-
procedure. In most cases of ANN application, the peated and/or the initial neural architecture modified
character of the error surface is unknown, and the se- to avoid over-training.
lection of the learning rate value has to be completely
arbitrary. It is only possible to tune its value as a result
of ex-post analysis of the training procedure. Fortu-
nately, only the back-propagation algorithm requires a
Data partitioning
fixed a priori determination of the h value. Most of
the modern algorithms (such as quick propagation,
quasi-Newton-based algorithms or conjunct gradient As was mentioned above, before training starts, the
algorithms) apply the adaptive learning rate, which available data set has to be partitioned into at least
QSAR. Although PCA is a very effective strategy in a given specimen to mate and produce offspring equa-
decreasing the number of dimensions of the input vec- tions. The optimization of the initially randomly dis-
tor, the obtained PCA components do not have straight- tributed characteristics (i.e., input variables) is assured
forward physicochemical interpretations, which ren- by the cross-over between the fittest individuals and
ders them less valuable for explanatory models. by accidental mutations (insertion or deletion of indi-
ANNs are also subject to the dimensionality prob- vidual feature-variables). However, in the case of neu-
lem [64]. The selection of the most important vari- ral networks, GA optimization becomes somewhat
ables facilitates the development of a robust model. trickier than in the case of traditional linear models. It
There are a number of approaches that can be used: is important to consider the type of fitness function
1. Removal of redundant and low-variance vari- that is applied to select the best equation. In most
ables. In the first place, the manual removal of de- cases, the genetic algorithms are intended to develop
scriptors that have a pairwise correlation coefficient multiple regression models, and the fitness function
greater than 0.9 should be performed. In such a case, assesses how well the obtained linear model predicts
one of two redundant descriptors is removed, as it the desired dependent variable. Such algorithms favor
does not introduce novel information and typically the selection of variables that have a linear correlation
will not improve the robustness of the model. How- with the predicted value and do not have to be neces-
ever, there are several ANN researchers who suggest sarily the best suited for the construction of a non-
that, as long as both descriptors originate from differ- linear neural network. If the problem can be solved
ent sources, they should be left in place due to possi- well in a linear regime, the introduction of more
ble synergetic effects they can enforce on the model. flexible non-linear networks will not improve the per-
Moreover, it is advisable to remove descriptors that formance [13]. However, if the problem is not linear, it
do not introduce valuable information into the data is advisable to apply as many non-linear features in the
set. Low variance of a variable is typically an indica- GA as possible. These features can be binary or quad-
tion of such a situation. Such an approach was used, ratic interactions as well as splines (inspired by the
for example, by Kauffman and Jurs as a first step in MARS algorithm of Friedman [14, 51]). Although
their objective feature selection protocol [27]. splines allow interpolative models to be built, achiev-
2. Brute force experimental algorithms. As long ing response surfaces of considerable complexity from
as the training data set is relatively small in number simple building blocks, it should be understood that
(and typically in the case of QSAR applications, the these models no longer hold direct physicochemical in-
data set does not exceed 100 compounds), one can at- terpretations, and as such, should be used with caution.
tempt an experimental data selection approach. Due 4. Genetic algorithm neural models. Another
to the high performance of modern computers it is approach to input selection is the use of genetic algo-
possible to build thousands of ANNs with different rithms in conjunction with neural networks (so-called
input vectors within an hours time. If a uniform train- Genetic Neural Networks GNN). The difference
ing algorithm is applied and a selection is based on from the approach described above is that, in a GNN
the value of the test set error, such an approach allows population, an evolutionary population comprises
the selection of the most robust input vector. An ex- neural networks instead of linear equations. As a re-
ample of such an algorithm, Intelligent Problem sult, the fitness function assesses the prediction qual-
Solver (IPS), can be found in the Statistica Neural ity of a particular neural network instead of a linear
Networks 7.0. Examples of applications of that strat- equation. Other than this aspect, the process is basi-
egy can be found in our previous papers [57, 59, 61]. cally similar. It involves a cross-over of the parent
3. Genetic algorithm linear models. Another models and breeding offspring models with character-
technique that is extremely efficient applies a genetic istics of both parents (in this case, parts of the input
algorithm for the selection of the input vector [44]. vector and sometimes structural features of the parent
Genetic algorithms utilize evolutionary principles to models). Moreover, a mutation possibility is allowed
evolve an equation that is best adapted to the environ- that introduces or eliminates certain of the input de-
ment (i.e., to the problem to be solved). The evolu- scriptors. Such an approach was carefully analyzed by
tionary pressure is introduced by a fitness function So and Karplus in their series of papers [4749] and
(based on the correctness of the prediction of the de- in the study of Yasri and Hartsough [68]. The main
pendent variable) that determines the probability of advantage of such an approach is that the fitness of
tion of a committee of nonlinear neural networks has (and is thus barely within the scope of this paper).
a specifically beneficial effect on the quality of the A good experience with PNN models was also re-
prediction in cases in which a high degree of uncer- ported by Mosier and Jurs [37], who applied that
tainty and variance may be present in the data sets. An methodology to the modeling of inhibition of soluble
interesting example of a similar approach by Bakken epoxide hydrolase set. The same data base was also
and Jurs has been applied to the modeling of the inhi- modeled by McElroy and Jurs [35] with either regres-
bition of human type 1 5a-reductase by 6-azasteroids sion (MLR or ANN) or classification (k-nearest neigh-
[3]. The authors constructed multiple linear regres- bor (kNN), linear discriminant analysis (LDA) or ra-
sion, PCA, PLS and ANN models predicting the pKi dial basis function neural network (RBFNN)) ap-
values of inhibitors of 5a-reductase and compounds proaches. Their research demonstrated that non-linear
selectivity toward 3b-hydroxy-D5-steroid dehydroge- ANN models are better in the regression problems but
nase/3-keto-D5-steroid isomerase. Again, neural net- that kNN classifiers exhibit better performance that
work models appeared to have higher predictive pow- ANNs using radial basis functions.
ers than their linear counterparts. An interesting and Finally, the input of our research should also be
original application of ANNs can be found in the pa- mentioned here. A handful of models have been de-
per of Funar-Timofei et al., who developed MLR and rived that predicted the biological activity of ethyl-
ANN models of enantioselectivity in the reaction of benzene dehydrogenase, an enzyme that catalyzes
oxirane ring-opening catalyzed by epoxide hydrolases the oxidation of alkylaromatic hydrocarbons to secon-
[15]. The application of neural models significantly dary alcohols. In our research, both regression and
improved the prediction relative to that of the linear classification MLP networks [59] were used, as were
models. Bayesian-type general regression neural networks
Apart from the most common MLPs, there are (GRNNs) [56, 57, 61], which were based on descrip-
a number of different neural architectures that have tors derived from DFT calculations and simple topo-
been successfully applied to studies on enzyme activ- logical indices. Our approach differed from that pre-
ity. In addition to the already-mentioned CP-ANNs, sented above. It was possible to construct a model that
Bayesian regularized neural network (BRANN [7]) predicted the biological activity of both inhibitors and
models were used with success by Polley et al. [42] substrates. Moreover, our studies were more focused
on gaining insight into the mechanism of the biocata-
and Gonzalez et al. [18] to predict the potency of thiol
lytic process than the development of robust QSAR
and non-thiol peptidomimetic inhibitors of farnesyl-
modeling methods. Nevertheless, the discussed issues
transferase and the inhibitory effects of flavonoid de-
concerned the random cases partitioning problem, re-
rivatives on aldose reductase [13]. The authors note
sulting in over-training, and the function of the se-
that the application of the Bayesian rule to the ANN
lected ANN architecture [57].
automatically regularizes the training process, avoid-
It appears that the presentation of an example of
ing the pitfalls of over-fitting or overly complex solu-
enzyme activity modeling with ANNs might be bene-
tions. This benefit, according to Burden [7], is due to
ficial to the reader. Therefore, a description of a sim-
the nature of the Bayesian rule, which automatically
ple case is presented below, illustrating the problem of
panelizes complex models. As a result, authors fre-
enzyme reactivity model construction.
quently do not use the test sets, utilizing all available
compounds to train the networks. Another application
of neural networks with similar architecture can be
found in the paper of Niwa [38]. The author used
probabilistic neural networks (PNNs) to classify with Practical example
high efficiency 799 compounds into seven different
categories exhibiting different biological activities Modeling problem
(such as histamine H3 antagonists, carbonic anhydrase
II inhibitors, HIV-1 protease inhibitors, 5 HT2A an- A molybdoenzyme, ethylbenzene dehydrogenase (EBDH),
tagonists, tyrosine kinase inhibitors, ACE inhibitors, is a key biocatalyst of the anaerobic metabolism in the
and progesterone antagonists). The obtained model is denitrifying bacterium Azoarcus sp. EbN1. This enzyme
clearly meant to be used for HTS screening purposes catalyzes the oxygen-independent, stereospecific hy-
Tab. 1. Substrates and inhibitors (n = 46) of EBDH. R1 stands for substituents or modifications of the aromatic ring system (e.g., 1-naphthalene
indicates that a naphthalene ring is present instead of a phenyl ring and that R2 is in the 1- position), and R2 describes structural features of the
substituent that is oxidized to alcohol in the methine position. -CH2CH2CH2- like symbols depict cycloalkyl chains of indane, tetrahydronaph-
thalene or coumaran; rSA indicates specific activity related to EBDH activity with ethylbenzene in standard assay condition
Training
1 2-CH2CH3 CH2CH3 2.62 21.4 -6.4
2 4-CH2CH3 CH2CH3 64.27 63.2 43.3
3 4-CHOHCH3 CH2CH3 2.68 1.7 1.2
4 1-naphthalene CH2CH3 0 1.1 -20.5
5 1H-indene CH2CH3 242 241.6 246.0
6 2-OCH3 CH2CH3 1.1 7.2 14.7
7 2-naphthalene CH3 9.3 12.8 7.4
8 2-SH CH2CH3 0 14.0 14.0
9 2-OH CH2CH3 56.14 56.4 9.2
10 2-py CH2CH3 0 7.5 2.4
11 2-pyrrole CH2CH3 234.63 233.9 277.0
12 2-CH3 CH2CH3 3.8 11.7 10.4
13 2-furan CH2CH3 133.92 132.8 147.3
14 2-thiophene CH3 0 3.8 -2.3
15 3-NH2 CH2CH3 25.45 34.6 11.0
16 3-py CH2CH3 3.72 6.1 3.5
17 H CH2CHCH 282.6 286.1 244.4
18 3-COCH3 CH2CH3 3.47 1.1 21.4
19 4-NH2 CH2CH3 134 132.8 97.8
20 4-OCH3 CH2CH3 313.27 314.9 273.9
21 4-COOH CH2CH3 0 0.7 24.7
22 4-Ph CH2CH3 29.65 29.4 5.1
23 4-OH CH2CH3 259.01 253.1 241.8
24 4-py CH2CH3 0 26.1 1.3
25 2,4-di-OH CH2CH3 362.69 373.0 302.3
26 4-F CH2CH3 6.4 6.3 -2.4
27 p-OCH3 -CH2CH2CH2- 122.12 122.3 93.2
28 naphthalene 1,8-CH2-CH2- 0 2.3 16.3
29 H -CH2-CH2-O- 80 73.8 86.2
30 H CH2CH3 100 96.1 60.4
31 H -CH2CH2CH2- 80.34 86.1 72.3
32 H CH2CH2CH3 14 16.1 33.8
33 H CH3 0 8.3 17.9
34 H CHOHCH3 0 3.6 7.7
Test
35 2-NH2 CH2CH3 94.53 76.2 58.0
36 2-thiophene CH2CH3 242.92 217.8 173.1
37 2-furan CH3 0 5.8 15.2
38 3-CH3 CH2CH3 10 9.1 40.7
39 4-CH2OH CH2CH3 7.4 24.5 34.4
40 4-CH3 CH2CH3 28 15.7 52.9
41 4-OH CH2CH2CH3 180 206.8 124.4
42 p-NH2 -CH2CH2CH2- 33 12.7 47.6
Validation
43 2-Br CH2CH3 0 11.5 21.3
44 3-OH CH2CH3 24.31 12.3 34.4
45 H CH2CH=CH2 281.5 226.4 258.8
46 H CH2 CH2CH2CH2- 4.45 9.8 86.1
Fig. 5. (A) An example of topologic encoding: number of substituents: 2; number of heavy atoms in the longest substituent: 2; number of heavy
atoms in all substituents: 3; localization of substituents (para, meta, and ortho) in the aromatic ring in relation to the ethyl group: (1,0,0); (B) Mul-
liken charge analysis of 4-ethylpyridine (highest charge: 0.146; and lowest charge: 0.434) and the vibration mode of C-H bond stretching [59]
Model R2 Test R2 Validation R2 Mean Learning error Test error Validation error
pred. error
Model 1 0.984 0.957 0.996 8.56% 0.0174 0.0396 0.0654
MLP 11-6-1
Model 2 0.929 0.953 0.874 23.33% 0.0545 0.0856 0.0970
MLP 12-7-1
A B
Fig. 6. Scatter plots of the experimental relative specific activity (rSA) and the predicted rSA for (A) model 1 MLP 11-6-1 network (R2 =
0.986).and (B) model 2 MLP 12-7-1 (R2 = 0.929). The full circles depict test and validation cases. Empty circles training cases; black circles
test cases; grey circles validation cases; the dashed line depicts the 95% confidence interval of the model
the extent of activity for the test and validation cases, models, although not so straightforward as in the case
although model 1 exhibits markedly better perform- of MLR, is possible by two means: i) sensitivity
ance. It should be underlined here that the obtained analysis and ii) response curves and surfaces. Sensi-
models are far more complex than advised by the rule tivity analysis (see Tab. 3) provides a ranking of the
derived by Andrea et al. However, rigorous model variables used in the model [19]. However, as our ex-
cross-validation with test and validation cases demon- ample has shown, ANNs are able to achieve similar
strates their high generalization capabilities. This prediction levels by different approaches. Therefore,
finding again proves that the optimal r interval is it is advisable to perform a sensitivity analysis on
a problem-dependent issue. a group of models and draw conclusions from a re-
petitive occurrence of the important variables.
In our example, both models suggest that occupa-
tion of the para position is one of the most important
factors, as is a substituents ability to stabilize the car-
Interpretation
bocation form. Additionally, the energy of the HOMO
orbital and DDGradical are ranked comparably high.
Very often, an understanding of the factors influenc- More information can be gained from the response
ing enzyme catalysis is more important than even curves that depict the dependent variable (here rSA)
the best prediction tool. The interpretation of ANN as a function of an individual selected descriptor
Tab. 3. The results of sensitivity analysis. The ranks provide the descriptors order of importance in the model
Model 1 Model 2
(typically holding the remainder of the input at the av- situation. In a real compound, one can hardly change
erage value, although various other approaches have DDGcarbocation without changing DDGradical. A glimpse
also been used [2]). As a result, one obtains a 2D slice of the complexity of the true response hypersurface
through the multidimensional response hypersurface. can be gained through an analysis of the 3D response
In our case, one can analyze the influence of DDGcarbo- surfaces (Fig. 8), which shows that simple conclu-
cation
and DDGradical on the rSA. Both models provide sions drawn from response curves might not be en-
non-linear relationships between rSA, and both de- tirely true for the whole range of values. The figure
scriptors predict high values of rSA for negative (high predicts a high activity for compounds with a high
stabilization) values of DDGcarbocation and DDGradical stabilization of the carbocation and low stabilization
(Fig. 7A, B, E, F). The influence of a steric effect can of the radical, which appears chemically counter-
be analyzed in the same way for ZPE in the case of intuitive. Therefore, it is always advisable to study
model 1 (Fig. 7C) and the number of heavy atoms ANN models with real chemical probes i.e., design
in the longest substituent descriptor in the case of in silico a new compound with the desired character-
model 2 (Fig. 7G). In both cases, low activity is pre- istics and use the obtained model for prediction of the
dicted for high values of the descriptors, which indi- activity.
cates a negative steric effect on the specific activity. Nevertheless, all of these findings appear to be in
The influence of hydrophobicity can be analyzed by accordance with our recent results on the EBDH reac-
following the response of the difference between the tion mechanism. Both the radical (TS1) and the car-
maximum and minimum partial charges (Dq NBO bocation (TS2) barriers are of similar height and
Fig. 7D) or the fractional polar surface (Fig. 7H). In therefore, influence the observed kinetics. Moreover,
both cases, the obtained relationships indicate that the enzyme has evolved to host non-polar hydrocar-
non-polar, ethylbenzene-like compounds will be of bons, such as ethylbenzene and propylbenzene, which
a higher activity. It should also be noted that, as these explains why less polar compounds are favored in the
two variables are less important than the top-ranked reactivity. However, the electronic stabilization of the
one (e.g., DDGcarbocation), changes in their values exert transition states introduced by heteroatoms, especially
a smaller effect on rSA. in a resonance-conjunct para position, appears to over-
One should be aware however, that the analysis of come the polarity problems. This situation results in
response curves might sometimes be misleading, as a specific activity higher than that of ethylbenzene for
the 2D projection of ANN behavior is far from a real compounds such as 4-ethylphenol or 4-ethylresorcine.
using a Bayesian Regularized Neural Network. J Med 57. Szaleniec M, Tadeusiewicz R, Witko M: How to select
Chem, 2004, 47, 62306238. an optimal neural model of chemical reactivity? Neuro-
43. Rodakiewicz-Nowak J, Nowak P, Rutkowska-Zbik D, computing, 2008, 72, 241256.
Ptaszek M, Michalski O, Mynarczuk G, Eilmes J: Spectral 58. Szaleniec M, Witko M, Heider J: Quantum chemical
and electrochemical characterization of dibenzo- modelling of the CH cleavage mechanism in oxidation
tetraaza[14]annulenes. Supramol Chem, 2005, 17, 643647. of ethylbenzene and its derivates by ethylbenzene dehy-
44. Rogers D, Hopfinger AJ: Application of genetic function drogenase. J Mol Catal A Chem, 2008, 286, 128136.
approximation to quantitative structure-activity relation- 59. Szaleniec M, Witko M, Tadeusiewicz R, Goclon J:
ships and quantitative structure-property relationships. Application of artificial neural networks and DFT-based
J Chem Inf Comput Sci, 1994, 34, 854866. parameters for prediction of reaction kinetics of ethyl-
45. Rutkowski L, Scherer R, Tadeusiewicz R, Zadeh LA, benzene dehydrogenase. J Comput Aided Mol Des,
Zurda JM (Eds.): Artificial Intelligence and Soft Com- 2006, 20, 145157.
puting. Springer-Verlag, Berlin, 2010. 60. Tadeusiewicz R: Neural Networks (Polish). Akademicka
46. Silva J, Costa Neto E, Adriano W, Ferreira A, Gonalves Oficyna Wydawnicza, Warszawa, 1993.
L: Use of neural networks in the mathematical modelling 61. Tadeusiewicz R: Using Neural Models for Evaluation of
of the enzymic synthesis of amoxicillin catalysed by Biological Activity of Selected Chemical Compounds, in
penicillin G acylase immobilized in chitosan. World Applications of Computational Intelligence in Biology,
J Microbiol Biotechnol, 2008, 24, 17611767. Current Trends and Open Problems, Studies in Computa-
47. So S-S, Karplus M: Evolutionary optimization in quantita- tional Intelligence. Springer-Verlag: Berlin Heidelberg
tive structure-activity relationship: An application of ge- New York, 2008, 135159.
netic neural networks. J Med Chem, 1996, 39, 15211530. 62. Tadeusiewicz R: Neural network as a tool for medical
48. So S-S, Karplus M: Genetic neural networks for quanti- signals filtering, diagnosis aid, therapy assistance and
tative structure-activity relationships: Improvements and forecasting improving, in image processing, biosignals
application of benzodiazepine affinity for benzodi- processing, modelling and simulation, biomechanics.
azepine/GABAA receptors. J Med Chem, 1996, 39, IFMBE Proceedings. Eds. Dssel O, Schlegel WC,
52465256. Springer Verlag, Berlin, Heidelberg, New York, 2009,
49. So S-S, Karplus M: Three-dimensional quantitative 15321534.
structure-activity relationships from molecular similarity 63. Tadeusiewicz R: New trends in neurocybernetics. Com-
matrices and genetic neural networks. 1. Method and put Methods Mater Sci, 2010, 10, 17.
validations. J Med Chem, 1997, 40, 43474359. 64. Tadeusiewicz R, Grabska-Chrzstowska J, Polcik H:
50. So SS, Richards WG: Application of neural networks: Attempt of neural modelling of castings crystallization
quantitative structure-activity relationships of the deriva- control process. Comput Methods Mater Sci, 2008, 8,
tives of 2,4-diamino-5-(substituted-benzyl)pyrimidines 5869.
as DHFR inhibitors. J Med Chem, 1992, 35, 32013207. 65. Tetko IV, Livingstone DJ, Luik AI: Neural network stud-
51. Sutherland JJ, OBrien LA, Weaver DF: Spline-fitting ies. 1. Comparison of overfitting and overtraining.
with a genetic algorithm: a method for developing classi- J Chem Inf Comput Sci, 1995, 35, 826833.
fication structure-activity relationships. J Chem Inf Com- 66. Topliss JG, Edwards RP: Chance factors in studies of
put Sci, 2003, 43, 19061915. quantitative structure-activity relationships. J Med
52. Szaleniec M, Borowski T, Schuhle K, Witko M, Heider J: Chem, 1979, 22, 12381244.
Ab inito modeling of ethylbenzene dehydrogenase reac- 67. Winkler D: Neural networks as robust tools in drug lead
tion mechanism. J Am Chem Soc, 2010, 132, 60146024. discovery and development. Mol Biotechnol, 2004, 27,
53. Szaleniec M, Dudzik A, Pawul M, Kozik B: Quantitative 139167.
structure enantioselective retention relationship for 68. Yasri A, Hartsough D: Toward an optimal procedure for
high-performance liquid chromatography chiral separa- variable selection and QSAR model building. J Chem Inf
tion of 1-phenylethanol derivatives. J Chromatogr A, Comput Sci, 2001, 41, 12181227.
2009, 1216, 62246235. 69. Yu XH, Chen GA, Cheng SX: Dynamic learning rate op-
54. Szaleniec M, Hagel C, Menke M, Nowak P, Witko M, timization of the backpropagation algorithm. IEEE Trans
Heider J: Kinetics and mechanism of oxygen- Neural Netw, 1995, 6, 669677.
independent hydrocarbon hydroxylation by ethylbenzene 70. Zernov VV, Balakin KV, Ivaschenko AA, Savchuk NP,
dehydrogenase. Biochemistry, 2007, 46, 76377646. Pletnev IV: Drug discovery using support vector ma-
55. Szaleniec M, Salwiski A, Borowski T, Heider J, chines. The case studies of drug-likeness, agrochemical-
Witko M: Quantum chemical modeling studies of ethyl- likeness, and enzyme inhibition predictions. J Chem Inf
benzene dehydrogenase activity. Int J Quantum Chem, Comput Sci, 2003, 43, 20482056.
2012, 112, 19901999.
56. Szaleniec M, Tadeusiewicz R, Skoczowski A: Optimiza-
tion of neural models based on the example of assess-
ment of biological activity of chemical compounds. Received: February 7, 2012; in the revised form: April 2, 2012;
Comput Methods Mater Sci, 2006, 6, 6580. accepted: April 16, 2012.