Академический Документы
Профессиональный Документы
Культура Документы
Olexandr Isayev
and Prof. Denis Fourches
Laboratory for Molecular Modeling,
University of North Carolina at Chapel Hill, USA
Decline in Pharmaceutical R&D efficiency
The cost of developing a new
drug roughly doubles every
nine years.
Empirical Rules/Filters
Similarity Search
Consensus
QSAR MODELS QSA
VIRTUAL
SCREENING
~102 103
molecules
Potential
~106 109
molecules
Hits
3
O
Thousands of molecular descriptors
are available for organic compounds
C
N
0.613
constitutional, topological, structural,
quantum mechanics based, fragmental, steric,
O
O
N
pharmacophoric, geometrical,
0.380
thermodynamical, conformational, etc.
AA
-0.222
M O D Samples Features (descriptors)
0.708
CC
N
E (compounds)
TT
O
Descriptor X1 X2 ... Xm
P 1.146
N
O S
matrix Quantitative
O
N
O
C
R
1
Structure
X11 X12
ACTIVITY (i) 0.491
... II
X1m
N
0.301
U O I Activity
2 X21 X22 ...
0.141 VV
X2m
N
P
O
T
Relationships
... ... ... ...
0.956 II
...
N O
N
O 0.256
D
N
R n
- Building Xn1
of models
using machine learning
Xn2 ...
0.799
TT
Xnm
S
S
O
N
methods (NN, SVM, RF)
1.195 YY
- Validation of models
1.005
according to numerous
statistical procedures, and
their applicability domains.
4
Discovery of Novel Antimalarial Compounds
Enabled by QSAR-Based Virtual Screening
- Severe infectious diseases (~700,000 deaths per year worldwide)
- Caused by unicellular eukaryotic parasites, mainly Plasmodium falciparum
- Modeling Set: 158 actives, 2,975 inactives. kNN and SVM with ISIDA descriptors
External predictive power of QSAR models is critical
QSAR 176 putative hits
to enable their
Chembridge
Database
Similarity
Filters
application
Filters
to virtual
Drug-likeness
screening.
models 42 putative inactives
454,638 44,112 39,944
Technically
chemicals challenging
chemicals to compute molecular
chemicals
properties and descriptors for more >10 9 compounds.
EXPERIMENTAL CONFIRMATION (Dr. Guy, St Jude Res. Hosp)
-Most potent hit (SJ000565000) with EC50 = 95.6 nM and novel
No cheminformatics
molecular scaffold architecture is able to screen >109
-7 compounds with EC50 less than 2 M
compounds.
-18 compounds with moderate activity (EC50 2-8 M)
-All of the 42 putative inactives have EC50 >10 M
6 6
Chemical Datasets
Largest publicly available virtual libraries
Storage
&
Interactive
Manipulation analytics
with IPython Indexed, fully searchable, and accessible via high level API, e.g.,
(data. MolWt > 150) & (data.logP == 3)
Access in chunks or streaming compound by compound.
High throughput
-Data parsing from descriptor generator
Data Smiles Mol weight, logP,
Processing -2D structure H acceptors/donors,
generation Rot bonds, Daylight
Chemical 30M/hr
-Automatic curation fingerprints.
Library
Storage
&
Interactive
Manipulation analytics
with IPython Indexed, fully searchable, and accessible via high level API, e.g.,
(data. MolWt > 150) & (data.logP == 3)
Access in chunks or streaming compound by compound.
Modeling
&
Screening GPU accelerated
Rapid screening of extremely large libraries with similarity search
multiple molecular probes and QSAR/QSPR models ~1M Tanimoto/s
GPU - Case Study 1
Fast Computation of Molecular Properties for
Extremely Large Chemical Libraries
GDB-13 GDB-17
Subset of 141 M Random sample
of 50 M
Funding
- NSF ABI program
- Office of Naval Research
Molecular fingerprints - bit string encodings of structural features
and/or calculated molecular properties.
Tanimoto Coefficient