Академический Документы
Профессиональный Документы
Культура Документы
Joshua Lederberg
Edward Feigenbaum
Carl Djerassi
J.Lederberg, E.A.Feigenbaum, C.Djerassi. Application of Artificial Intelligence to Chemical Inference. I. The Number of Possible OrganicCompounds. Acyclic Structures Containing C, H, O and N. J. Am. Chem. Soc, 1968, V. 91, P. 2973
Morton Munk D.B.Nelson, M.E.Munk, K.B.Gasli, D.L.Horald. Alanylactinobicyclon. An Application of Computer Techniques to Structure Elucidation. J. Org. Chem., 1969, V. 34, P. 3800
Shin-Ichi Sasaki
S.I.Sasaki, H.Abe, T.Ouki, M.Sakamoto and S.Ochiai. Automated Structure Elucidation of Several Kinds of Aliphatic and Alicyclic Compounds. Analytical Chemistry, 1968, V. 40, p. 2220
M.E. Elyashberg, L.A. Gribov Formal logical interpretation of IR spectra using characteristic frequencies. Zhurn. Appl. Spectrosc. (J. Appl. Spectrosc.) 8, 1968, 998.
3D MODEL
MASSSPECTRUM
IR/RAMAN SPECTRA
NMR SPECTRUM
O
Br Br OH OH
OH
C12H12O3,
HO O H N
68,930,547,646
HO
H2N
C13H20O3, 14,431,269,166
12
Elimination of superfluous isomers from the full set by imposing different structural constraints.
Sources of structural constraints: Spectra, a priory information (sample origin, chemical rules, etc.)
Direct problems
Structures
Molecular Formulae
Nominal mass
Inverse problems
SpectrumStructure Correlations
Structure Generation from atoms and fragments. Structural and Spectral Filtering of isomers
Spectrum prediction for candidate structures Choice of the most probable structure
Separate section
L.A. Gribov, M.E. Elyashberg Computer-Assisted Identification of Organic Molecules by their Molecular Spectra. 1979. Monographic review.
+O
O S H3C
N O
Cl O H3C S NH O O O
Cl Cl OH O CH3
O NH O N O H3C
O N
H3C
O O OH
1986
2006
Spectrum
1J C-H
correlations
0 8
13C
H C H
Spectrum
H 1 C 1 C
16 24 32 40 48
F1 Chemical Shift (ppm)
C-1
5.5 5.0 4.5 4.0 3.5 3.0 2.5 F2 Chemical Shif t (ppm) 2.0 1.5 1.0 0.5
136 144
1H-1H
Spectrum 1
3J H-H
correlations
H-1
0.5
1.0
Spectrum 1
1.5
3.0
3.5
4.0
C1 H
C2
C
5.5 5.0 4.5 4.0 3.5 3.0 2.5 F2 Chemical Shif t (ppm) 2.0 1.5 1.0 0.5
4.5
5.0
5.5
H1
H2
2.0
2.5
C C C
C 1 C i
C k
10 20 30
Spectrum 13
170 180
H-i H-k -1
5.5 5.0 4.5 4.0 3.5 3.0 2.5 F2 Chemical Shif t (ppm) 2.0 1.5 1.0 0.5
are undistinguishable!
COSY
1.4 1.2 1
Ratio
HMBC
0.8 0.6 0.4 0.2 0 1 15 29 43 57 71 85 99 113 127 141 155 169 183 197 211 225 239 Problems
H C
H C n
(C-1)(C-2).
HMBC If a peak (H-1, C-2) is observed in HMBC, then atoms C-1 and C-2 are separated in the structure by ONE or TWO chemical bonds:
NOESY If a peak (H-1, H-2) is observed in NOESY (ROESY), then the distance between H-1 H-2 in space is less than 5.
K.A.Blinov, D.Carlson, M.E. Elyashberg et al. J. Magn. Reson. Chem. 2003, 41, 359-372. M.E.Elyashberg, K.A. Blinov, S.G. Molodtsov et al. J. Chem. Inf. Model. 2004, 44, 771-792
Distribution of 1.7 million fragments with skeletal atom number (max=16) and number of carbons (max=10)
Number of fragments (DB 1.7 mln)
120000 100000 80000 60000 40000 20000 0 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76
Number of skeletal atoms
Skeletal atoms
Carbon atoms
From 10 to 100 fragments selected by program from 13C spectrum usually exist in a molecule under investigation.
C-C connectivities.
Table of HMBC connectivities
Methods of 1. 2. 3.
13C
Y.D. Smurnyy, K.A. Blinov, T.S.Churanova, M.E. Elyashberg, A.J. Williams. J. Chem. Inf. Model. 2008, 48, 128-134
2 (ID:3)
CH3
3 (ID:2)
H3 C CH3 O H3 C O O OH O O O O O OH CH3 OH
HO
H3 C
HO
H3 C HO
dA(13C): 0.415 dI(13C): 1.789 dN(13C): 1.650 dA(1H): 0.058 dI(1H): 0.134 dN(1H): 0.144 dA Complex: 3.085
dA(13C): 0.912 dI(13C): 2.291 dN(13C): 2.261 dA(1H): 0.172 dI(1H): 0.257 dN(1H): 0.273 dA Complex: 4.992
dA(13C): 0.992 dI(13C): 2.267 dN(13C): 2.329 dA(1H): 0.173 dI(1H): 0.255 dN(1H): 0.276 dA Complex: 5.084
The higher speed and accuracy of chemical shift prediction influenced the system strategy.
Then:
Output file should be minimal. For this goal, severe constraints (axioms) must be introduced. Consequence: great risk to lose the correct structure.
Now:
Structural file is admitted to contain 105 and more structures (tcalc=5-10 min). Severe constraints may be removed. Solutions became more reliable
15.00
O O
107.00
O
66.60
198.60 131.70 15.20
27.60
OH O
O
46.00 71.90 54.50
CH3
O H3C
145.00 30.10
CH3 OH
21.00
OH
30.10
W.-G. Kim et al. Org. Lett., 2004, 6, 823-826, W. Steglich et al. Org. Lett.,2004, 6, 3175-3177, A. Bagno et al. Chem. Eur. J. 2006, 12, 5514 5525
145.00
198.60 131.70
CH3
15.20
The top of ranked output file found by StrucEluc: k=37176Filter149Remove Dupl.135, tg=1 m 40 s
1 (ID:27)
Revised
dNN=2.17
H3C O
O HO O O CH3 CH3
dN(13C):
O CH3
OH
dN(13C): 2.650
4 (ID:140)
Original
CH3 H3C
dNN=3.08
HO O O O H3C OH CH3 O
dN(13C):
H3C O
dN(13C): 3.075
3.079
N C
140.92
HC HC
128.76
128.73
CH
124.81
HC
128.68
C HC
128.07
138.16
HC N
128.73
C HC
124.81
128.80
144.42
134.68
C C
134.68
134.68
O O H
O O N N H O O N N N O H
+
H N O
O N
-
O OH N
C16H12N2O2
Struc. A B C
D
E F
0.2231
0.5744 0.0115
20.56
8.89 21.14
1.45
1.33 0.30
21.33
9.22 21.94
13.06
8.92 13.10
N C
140.92
HC HC
128.76
128.73
CH
124.81
HC
128.68
C HC
128.07
138.16
HC N
128.73
C HC
124.81
128.80
144.42
134.68
C C
134.68
134.68
O O H
1 (ID:14)
HO
2 (ID:1)
3 (ID:8) OH NH O
4 (ID:3) OH
N N
N O O
N O
Expected by authors
OH N
Data INC NN QM
0.9768
C27H22N4O3
CH
121.89
CH
117.31
CH
126.91
C
122.16
HC
115.82
C
129.71
N
H2C
38.33
C
154.90
CH2
40.92
N
CH2
44.18
C
157.77
H2C
61.63
A. Balandina et al. Rus. Chem. Bul., Int. Ed., 2006, 55, 2256-2264
N N
N N
O O
O O
Correct
Proposed structures with different MFs which were checked by DFT calculations.
Experimental MF=C27H22N4O3
N N O
C27H23N4O2
N
+
C27H23N4O2
O
N H
C27H23N4O2
Doublet!
N
N
N
O
+
O 154 N
O N
N O
NH
154 ~sp3!
C27H22N4O2
CH CH
129.16
127.30
HC
127.85
CH
129.34
HC
128.77
C CH CH 117.31
129.16
131.61
HC C
144.01
127.30
C CH
129.34
132.72
143.70
HC 121.89 CH 126.91
C 117.38 O O O N
CH2 61.63
CH 115.82
CH2 44.18
2 (ID:38)
3 (ID:17)
129.94 128.77 128.77 129.16 129.16 131.61 38.33 40.92
O
44.18 38.33 N 40.92 61.63
N
127.30 127.85 129.34
O O
44.18
N
61.63 157.77
O
157.77
N
117.38
121.89 117.31
N
144.01
O O
117.38
O N O
61.63 44.18 157.77
N
154.90
N
40.92 38.33
129.94
dI dN
: 2.740 : 2.278
38.33 40.92 44.18
4 (ID:30)
115.82 126.91 121.89 117.31 122.16 154.90
129.71
N
40.92 38.33
N O
N O
157.77 61.63
157.77
N
144.01 117.38
O N
143.70
O
144.01131.61 117.38
N
129.71 115.82 126.91
154.90 143.70 129.34 132.72 127.30 117.38 122.16N 144.01 129.34 127.85 127.30
O
131.61 129.16 132.72 129.16 129.34 129.34 128.77 127.30 127.30 127.85
N
154.90
N
61.63 44.18
157.77
117.31 121.89
128.77
O
40.92 38.33
115.82
129.94
a=1
H
If the axioms upon correlation length are violated, the data become contradictory.
COSY a=1
H
a=2 HMBC
H3C
1 18 5 16 17 9 12 15 19 3 13 7 14 6 11 8
CH3
4 20
CH2
10
H3C
2
CH3
OH
22
HO
m=15, a=1-3
21
M.E.Elyashberg, K.A.Blinov, S.G.Molodtsov et al. J. Chem. Inf. Model. 2007, 47, 1053-1066
H3C
2
The Safest mode: {m<15, a=x} 40,225,345,056 combinations CH3 4 14 are theoretically possible. 20 10,637,725 connectivity 6 11 combinations were used CH2 10 during Structure Generation. 8 Solution: 19 13 k=28289; tg=24 min; OH CH3 22 r=1 15 3
HO
21
Br HN OH
HO Br
NH
NH2
New algorithm of structure generation reveals symmetry in NMR data. Algorithm is capable of automatic adjusting to generation of symmetric molecules.
H3C
CH3
Ionic structures
O N HO NH
+
O
+
H N
H N
S N O
S N H
N
+
H O
N H HO H3C NH
N
OH OH
+ +
, ... O, if you knew from which rubbish Poetry grows Anna Akhmatova
To overcome the lack of information, Database Fragments (1.7 mln) or/and Users Fragments are used. Introduction of fragments is necessary IF: 1. Number of observed 2D NMR correlations is markedly smaller than theoretically expected one. 2. Deficit of hydrogen atoms has place. As a result even the theoretically expected number of correlations is too small. Taking this into account an algorithm of fragment implantation into MCD was developed.
tg k
Number of correlations is small.
HC
6.42
1.99
0.88
H2C CH
4.29
C C C O CH2
4.13
2.36
CH CH CH2
OH
4.18
C C C CH OH
5.35
CH
1.10 1.60
CH
CH2
HC
1.38
5.76
CH OH
3.73
2.66
1.38 1.60
5.76 6.42
CH2 H2C O HO C
4.13
CH C O
CH
HC
HC O CH C CH2
2.36
4.18
HC
CH3
0.65
CH2
CH2 CH3
1.12
H2C
4.29
CH3
0.88
Ashwaganhanolide
H3C
1.99
Fragments were found in DB from 13C NMR search. Number of Found Fragments L=5524.
Fragment # 1 17222
Mol.
Frag.
Solution
960 MCDs were created from the fragment #1 Structure Generation from 960 MCDs: k=960246 tg= 29 m 30 s
O CH3 HO HO OH S OH
CH3CH3 O O OH
2 (ID:16)
3 (ID:12)
H3C O
HO O O O CH3CH3 OH CH3
CH3 CH3
H3C CH3
CH3 O
HO HO HO HO S
O CH 3
OH CH3 HO HO
O O O CH3 CH CH3 3
OH CH3
CH CH3 3
O H3C O O
CH3 CH3
HO HO
S CH3 O
CH3 CH3
CH3 CH3 O O
OH
CH3
H3C HO
dA(13C): 2.976 (v.10.05) dI(13C): 3.487 dN(13C): 3.590
HO
O O O
O OH
O CH3
O OH
O
N H3C
O O
OH
CH3 O
O O
O
O OH H C 3 O OH
CH2 O O CH3 O
O
OH
CH3
O
O O CH3
CH3 CH3
O O
O OH
Usage of fragments is not panacea for all cases. Possible causes of failures:
Large fragments capable of helping to solve a problem are absent from DB of the system. Appropriate fragments are found or introduced by chemist, but the number of possible shift assignments is so huge (more than 100 million), that CPU resources fail (combinatorial explosion).
Number of MCDs created by program is huge. Structure generation CPU time becomes not acceptable.
22.14
24.81 36.18
H
OH
144.59 115.25 139.47
H3C
14.04
R
O
176.41 151.33
75.17
O
162.02
O HO H2C
64.62 156.19 111.24 136.97 97.20 164.90
126.25
107.28
OH
R R
3
107.41
OH
95.11
139.06 106.87
158.46 101.58
162.41
158.56
OH
OH
C30H28O11 DBE=17
To introduce 1,2,3,4,5-AR fragment it is necessary to check 4 mln different shift assignments to carbon atoms of the fragment .
N N N
N N N H O
N CH3
H3C
Alkaloids of cryptolepine series for which signals in 13C 1H NMR are assigned.
O N H N N N CH3 . H3C N CH3 H N O H N
N N . N N H3C N O N H O H N N H H N H3C
User Fragment Data Base (UDB) was created. UDB contains 342 fragments.
N N N
N N N H O
N CH3
H3C
N H O N N H3C
5
7.60
10
10.09 11.65
15
17.24 18.58 19.08 18.12
DP-1
DP-2
20
21.40 22.94 23.51
35 %
16 %
25 30 35
25.10 26.08
34.79
DP-2 separation and spectra registration were performed by several groups in USA.
DP-1 (35%, 1.1 mg), DP-2 (16%, 200 g).
DP-2: solution of 100 g in 150 l of D-DMSO; ampoule 3 mm, =25 , HSQC (17 h), HMBC (17h), 1H-15N HMBC (72 h), sensitivity to 15N is 50 times lower than to 13C - ROESY It was found from MS: MS\MS :MH+=479, C32H22N4O
2 (ID:38) CH3 N
3 (ID:114) H3C
4 (ID:119) CH3 N O N N N N O N
N O N N O N
N CH3 13 : 2.849 (4.434) d A( C) 13 d ( : 4.873 (7.561) F C) 1H) d ( : 0.271 (0.460) A 5 (ID:461) H3C N O N N O N
N N CH3 13 : 5.012 (7.787) d A( C) 13 d ( : 5.733 (9.436) F C) 1H) d ( : 0.443 (0.592) A 7 (ID:422) CH3 13 : 5.431 (9.206) d A( C) 13 d ( : 5.415 (9.269) F C) 1H) d ( : 0.526 (0.680) A 8 (ID:93)
CH3 N O N N N
CH3 N
CH3 N
N CH3 13 : 5.981 (8.981) d A( C) 13 d ( : 6.074 (8.656) F C) 1H) d ( : 0.612 (0.815) A N CH3 13 : 6.190 (9.610) d A( C) 13 d ( : 5.893 (8.972) F C) 1H) d ( : 0.525 (0.666) A
H3C
Solution was found using StrucEluc in interactive mode. Initial MCD was transformed into the final one by spectroscopist during 12 hours of program operating.
N N
N N CH3
N N N N CH3
dA (13C): 4.449 (6.855) dF(13C): 5.189 (7.377) dA (1H): 0.381 (0.553) d(MS): 0.846 6 (ID:25)
N N N N CH3
H3C N N N N
dA (13C): 4.793 (6.089) dF(13C): 6.073 (7.424) dA (1H): 0.406 (0.544) d(MS): 0.905 8 (ID:92)
13 A( C): 3.540 (4.507) 13 F( C): 5.703 (6.914) 1 A( H): 0.351 (0.507) : 0.905
dA (13C): 4.570 (6.296) dF(13C): 6.181 (7.982) dA (1H): 0.375 (0.565) d(MS): 0.905 7 (ID:232)
N N N N CH3
dA (13C): 5.385 (7.437) dF(13C): 5.704 (7.960) dA (1H): 0.509 (0.684) d(MS): 0.751
N N N N H3C N N CH3 N
dA (13C): 5.424 (7.277) dF(13C): 5.496 (7.263) dA (1H): 0.566 (0.814) d(MS): 0.751
N
13 A( C): 5.342 (8.180) 13 F( C): 5.645 (8.486) 1 A( H): 0.415 (0.617) : 0.905
N CH3
dA (13C): 5.442 (7.451) dF(13C): 6.371 (9.112) dA (1H): 0.492 (0.659) d(MS): 0.905
Spectrum ROESY provided a first criterion for choice of correct structure (r<5). 1 peak 2 peaks OR
2.5
H3C N
2.5 2.5
CH3 N N
5.9
Two strongest peaks in MS are 232 and 217. 232+217=M Second criterion: each peak can be assigned to upper or lower part of the molecule.
m/z=232
OR
m/z=217
m/z=217
m/z=232
ROE
2 (ID:85)
TC-6
3 (ID:36)
ROE
4 (ID:334)
ROE
H3C N N
N N
N N N N N
N N CH3
N N
N CH3
dA(13C): 3.540 (4.507) dF(13C): 5.703 (6.914) dA(1H): 0.351 (0.507) d(MS): 0.905 5 (ID:35) dA(13C): 4.449 (6.855) dF(13C): 5.189 (7.377) dA(1H): 0.381 (0.553) d(MS): 0.846 6 (ID:25)
N CH3
dA(13C): 4.570 (6.296) dF(13C): 6.181 (7.982) dA(1H): 0.375 (0.565) d(MS): 0.905 7 (ID:232)
dA (13C): 4.793 (6.089) dF(13C): 6.073 (7.424) dA (1H): 0.406 (0.544) d(MS): 0.905 8 (ID:92)
ROE
MS
MS
ROE
N
N N
N N
H3C
N N N
N N
dA(13C): 5.342 (8.180) dF(13C): 5.645 (8.486) dA(1H): 0.415 (0.617) d(MS): 0.905 9 (ID:41)
CH3
dA(13C): 5.385 (7.437) dF(13C): 5.704 (7.960) dA(1H): 0.509 (0.684) d(MS): 0.751 10 (ID:179)
N CH 3
N N CH3
dA(13C): 5.424 (7.277) dF(13C): 5.496 (7.263) dA(1H): 0.566 (0.814) d(MS): 0.751 11 (ID:84) dA (13C): 5.442 (7.451) dF(13C): 6.371 (9.112) dA (1H): 0.492 (0.659) d(MS): 0.905 12 (ID:231)
MS
ROE
MS
N
MS
N N
N N N N
H3C N N N
N N N
N CH3
dA(13C): 5.485 (6.919) dF(13C): 6.405 (7.624) dA(1H): 0.703 (0.996) d(MS): 0.751
N CH3
dA(13C): 5.487 (7.038) dF(13C): 5.658 (7.071) dA(1H): 0.377 (0.533) d(MS): 0.905 dA(13C): 5.676 (7.254) dF(13C): 6.083 (7.563) dA(1H): 0.675 (0.980) d(MS): 0.751
CH3
dA (13C): 5.679 (7.714) dF(13C): 5.266 (7.129) dA (1H): 0.573 (0.797) d(MS): 0.751
232
217
N
For the first time, application of ES allowed solving a structural problem, which a prominent expert in NMR spectroscopy and structure elucidation failed to solve.
StrucEluc was enhanced by algorithm of determining the most probable relative stereochemistry of rigid structures. . Stereochemistry is determined using NOESY \ ROESY data. For structures having more than 7 stereocenters, optimization of geometry is performed by means of Genetic Algorithm (GA).
Brevetoxin B
Number of stereocenters: N=23 Number of stereoisomers ~ 8,400 000 CPU time necessary for optimizing geometry of all 8.4 mln stereoisomers ~ 1 month
Configuration of all 23 stereocenters was correctly determined by GA in 2 h 50 m.
H H O HO H3C H O CH3 H CH3 CH3 O O O H O H H H H CH3 O H CH3 O CH2
CH3
H O
Efficiency of Structure Elucidator System efficiency was proved by structure elucidation of ~300 natural products.
Permanent solving new complicated problems is a basis for creation and further development of the Structure Elucidator.
It should be expected that an expert system similar to Structure Elucidator can serve as a kernel of a research center intended for molecular structure elucidation and investigation.
Expert systems like the StrucEluc will be used widespread in the nearest 5-10 years. They will become a routine tool in laboratories engaged in spectroscopy, organic chemistry, chemistry of natural products and analytical chemistry.