S4561 Virtual Screening 1b Compound Libraries

Dr.
Olexandr Isayev
and Prof. Denis Fourches
Laboratory for Molecular Modeling,
University of North Carolina at Chapel Hill, USA
Decline in Pharmaceutical R&D efficiency
The cost of developing a new
drug roughly doubles every
nine years.
1033 drug-like chemicals*
108 compounds in PubChem
106 compounds in ChEMBL

with 1known bioactivity
Scannell et al. Nature Reviews Drug Discovery, 2012, 11, 191-200
Need of novel bio/cheminformatics methods that

(i) Fully exploit the potential of modern chemical biological data streams;
(ii) Reliably forecast compounds bioactivity and safety profiles;
(iii) Accelerate the translation from basic research to drug candidates
* Polishchuk, Madzhidov, Varnek. J Comput Aided Mol Des. 2013, 27(8):675-9. 2
Ligand-based Virtual Screening
to identify potential hits
Empirical Rules/Filters
Similarity Search
Consensus
QSAR MODELS QSA
VIRTUAL
SCREENING
~102 103
molecules
Potential
~106 109
molecules
Hits
3
O
Thousands of molecular descriptors
are available for organic compounds
C
N
0.613
constitutional, topological, structural,
quantum mechanics based, fragmental, steric,
O
O
N
pharmacophoric, geometrical,
0.380
thermodynamical, conformational, etc.
AA
-0.222
M O D Samples Features (descriptors)
0.708
CC
N
E (compounds)
TT
O
Descriptor X1 X2 ... Xm
P 1.146
N
O S
matrix Quantitative
O
N
O
C
R
1
Structure
X11 X12
ACTIVITY (i) 0.491
... II
X1m
N
0.301
U O I Activity
2 X21 X22 ...
0.141 VV
X2m
N
P
O
T
Relationships
... ... ... ...
0.956 II
...
N O
N
O 0.256
D
N
R n
- Building Xn1
of models
using machine learning
Xn2 ...
0.799
TT
Xnm
S
S
O
N
methods (NN, SVM, RF)
1.195 YY
- Validation of models
1.005
according to numerous
statistical procedures, and
their applicability domains.
4
Discovery of Novel Antimalarial Compounds
Enabled by QSAR-Based Virtual Screening
- Severe infectious diseases (~700,000 deaths per year worldwide)
- Caused by unicellular eukaryotic parasites, mainly Plasmodium falciparum
- Modeling Set: 158 actives, 2,975 inactives. kNN and SVM with ISIDA descriptors
External predictive power of QSAR models is critical
QSAR 176 putative hits
to enable their
Chembridge
Database
Similarity
Filters
application
Filters
to virtual
Drug-likeness
screening.
models 42 putative inactives
454,638 44,112 39,944
Technically
chemicals challenging
chemicals to compute molecular
chemicals
properties and descriptors for more >10 9 compounds.
EXPERIMENTAL CONFIRMATION (Dr. Guy, St Jude Res. Hosp)
-Most potent hit (SJ000565000) with EC50 = 95.6 nM and novel
No cheminformatics
molecular scaffold architecture is able to screen >109
-7 compounds with EC50 less than 2 M
compounds.
-18 compounds with moderate activity (EC50 2-8 M)
-All of the 42 putative inactives have EC50 >10 M
14.2% hit rate >> HTS hit rate (0.1 5%)

SJ000565000
Zhang, Fourches, et al. JCIM, 2013, 53, 475-492 5
Study Design
6 6
Chemical Datasets
Largest publicly available virtual libraries
GDB-13 955 M compounds

GDB-13-ABCDE subset 141 M
GDB-17 subset 50 M
1 Blum and Reymond, 2009, J Am Chem Soc, 131, 87328733

2 Ruddigkeit et al., 2012, J Chem Inf Model, 52, 2864-2875
Setup
Hardware Stack Software Stack
Intel Core i7 4770 Ubuntu 12.04
CPU 3.4GHz, Anaconda Scientific Py
Intel H87-based thon 2.7.6 Distribution
motherboard, Pandas / Pytables
32GB of DDR3 1600 MKL optimized NumPy
memory NUMBAPRO for CPU
Nvidia Tesla K20 for optimization
GPU accelerated RDKit
calculations
C / CUDA
High throughput
-Data parsing from descriptor generator
Data Smiles Mol weight, logP,
Processing -2D structure H acceptors/donors,
generation Rot bonds, Daylight
Chemical 30M/hr
-Automatic curation fingerprints.
Library
Screening & Modeling

High throughput
Chemical 30M/hr
Library
Smaller datasets (<1M)

directly allocated in RAM
Storage
&
Interactive
Manipulation analytics
with IPython Indexed, fully searchable, and accessible via high level API, e.g.,
(data. MolWt > 150) & (data.logP == 3)
Access in chunks or streaming compound by compound.
High throughput
Chemical 30M/hr
Library
Smaller datasets (<1M)

directly allocated in RAM
Storage
&
Interactive
Manipulation analytics
with IPython Indexed, fully searchable, and accessible via high level API, e.g.,
(data. MolWt > 150) & (data.logP == 3)
Access in chunks or streaming compound by compound.
Modeling
&
Screening GPU accelerated
Rapid screening of extremely large libraries with similarity search
multiple molecular probes and QSAR/QSPR models ~1M Tanimoto/s
GPU - Case Study 1
Fast Computation of Molecular Properties for
Extremely Large Chemical Libraries
GDB-13 GDB-17
Subset of 141 M Random sample
of 50 M
Our GPU-accelerated cheminformatics platform is able to compute

key molecular properties for GDB-13 (855M), GBD-13-ABCDE
(141M), and a subset of GDB-17 (50M) compounds.
GPU - Case Study 1
Fast Computation of Molecular Properties for
Extremely Large Chemical Libraries
Our GPU-accelerated cheminformatics platform is able to compute

key molecular properties for GDB-13 (855M), GBD-13-ABCDE
(141M), and a subset of GDB-17 (50M) compounds.
GPU - Case Study 2
Virtual Screening of Very Large Chemical
Libraries to Identify Bioactive Compounds
- Lacosamide (trade name Vimpat) is

an anticonvulsant drug used to
prevent seizures for patients treated
for epilepsy; Lacosamide
- Functionalized amino acid;
- Many active analogues have been

synthesized in Prof. Harold Kohns
laboratory* at UNC-CH.
*Wang et al., 2011, ACS Chem Neurosci, 2, 90106
GPU - Case Study 2
Analog 1 Analog 2
Analog 3 Analog 4 Analog 5
Similarity search using

200M compound subset
of GDB-13/17 Lacosamide as molecular
probe
GPU - Case Study 2
Compound ID Tanimoto Ts
The GPU-accelerated screening Analog 2 0.997
platform was able to retrieve: Analog 3 0.995
-known active analogues of Analog 1 0.994
lacosamide, Analog 4 0.992
-several functionalized amino Analog 5 0.978

Gdb13-a10573585 0.977
acids present in GDB-13,
Gdb13-b28137563 0.977
-a novel compound (Gdb17-
Gdb13-a36264983 0.976
44140083) fully matching the Gdb13-a36264952 0.976
pharmacophore of lacosamide. Gdb13-a10616005 0.976
Gdb13-a3011053 0.976
Gdb13-b21242261 0.976
Gdb17-44140083 0.976
Gdb13-a30878321 0.975
Gdb13-b3485216 0.975
In Summary
GPU-accelerated cheminformatics platform for high
performance virtual screening of extremely large
chemical libraries.
Tested for the analysis of the largest publicly available
dataset GDB-13 (~900M compounds) and (2) the
screening of ~200M compound library for similarity
search using an anticonvulsant drug as the molecular
probe.
Our platform aims to virtually screen billions of
compounds using similarity filters and QSAR models.
Acknowledgements
Professor Alex Tropsha (UNC-CH)
Colleagues at MML laboratory
NVIDIA & Mark Berger for generous hardware
donation
Funding
- NSF ABI program
- Office of Naval Research
Molecular fingerprints - bit string encodings of structural features
and/or calculated molecular properties.
INFORMATION ABOUT THE PRESENCE OF MOLECULAR FRAGMENTS

1 FRAGMENT IS PRESENT
0 FRAGMENT IS ABSENT
Similarity Search
Similarity searching using fingerprint representations of molecules is one of the
most widely used approaches for chemical database mining: it assumes that
similar compounds possess similar biological activities.
Tanimoto Coefficient
From J. Bajorath, SSS Cheminformatics, Obernai 2008

S4561 Virtual Screening 1b Compound Libraries

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

S4561 Virtual Screening 1b Compound Libraries

Загружено:

Авторское право:

Доступные форматы

Dr.

1033 drug-like chemicals*

108 compounds in PubChem

106 compounds in ChEMBL

Scannell et al. Nature Reviews Drug Discovery, 2012, 11, 191-200

Need of novel bio/cheminformatics methods that

14.2% hit rate >> HTS hit rate (0.1 5%)

GDB-13 955 M compounds

1 Blum and Reymond, 2009, J Am Chem Soc, 131, 87328733

Screening & Modeling

Smaller datasets (<1M)

Smaller datasets (<1M)

Our GPU-accelerated cheminformatics platform is able to compute

Our GPU-accelerated cheminformatics platform is able to compute

- Lacosamide (trade name Vimpat) is

- Functionalized amino acid;

- Many active analogues have been

Analog 3 Analog 4 Analog 5

Similarity search using

-several functionalized amino Analog 5 0.978

INFORMATION ABOUT THE PRESENCE OF MOLECULAR FRAGMENTS

From J. Bajorath, SSS Cheminformatics, Obernai 2008

Вам также может понравиться