Академический Документы
Профессиональный Документы
Культура Документы
Phil Evans
Diffraction from a crystal & the Laue equations Reciprocal lattice Ewald construction diffraction geometry
Images
Integration
Indexing
hkl I
(I)
Space group determination Quality assessment
hkl F
(F)
Decisions
Is this your best crystal? Mosaicity, resolution, size, ice Total rotation, rotation/image (overlaps), exposure time, position of detector. What is the correct lattice? [Integration parameters: box size, overlap check]
Integration Index choose lattice Rene unit cell Integrate Choose Laue group (point group) Scale & merge
What Laue group, space group? How good is the dataset? Any bad bits? Is the crystal twinned?
Convert I to F
Diffracts to high angle Large the diffracted intensity is proportional to the number of unit cells in the beam, so not much gain for a crystal much larger than beam (typically 50200m). Smaller crystals may freeze better (lower mosaicity)
Low mosaicity better signal/noise Good freeze no ice, minimum amount of liquid (low background), low mosaicity Optimise cryo procedure
The best that you have! (the least worst) Beware of pathological cases The quality of the crystal determines the quality of the dataset.
Phi = 0
Phi = 90
POINTLESS
SCALA
Scale/Merge
Detwin
Convert I to F
TRUNCATE
Integration
Two distinct methods:
Images
hkl
(I)
2-D: integrate spots on each image, add together partially recorded observations in the scaling program. MOSFLM, DENZO, HKL2000, etc 3-D: integrate 3-dimensional box around each spot, from a series of images. XDS, D*TREK, SAINT etc For today: MOSFLM Starting Point: A series of diffraction images, each recorded on a 2D area detector while rotating the crystal through a small angle (typically 0.2-1.0 per image) about a xed axis (the Rotation/Oscillation Method). Outcome: A dataset consisting of the indices (h,k,l) of all reections recorded on the images with an estimate of their intensities and the standard uncertainties of the intensities: h, k, l, I(hkl), (I)
integration slides from Andrew Leslie
Note that a series of images samples the full 3dimensional reciprocal space, Bragg diffraction and any other phenomena, all scattering from crystal and its environment.
In practice, defects in the crystals (or detectors) make integration far from trivial, eg weak diffraction, crystal splitting, anisotropic diffraction, diffuse scattering, ice rings/spots, high mosaicity, unresolved spots, overloaded spots, zingers/cosmic rays, etc, etc.
Integration
We want to calculate the intensity of each spot: then working backwards The simplest method is draw a box around each spot, add up all the numbers inside, & subtract the background (or better, t prole) To do this, we must know where the spot is: this needs
the unit cell of the crystal the orientation of the crystal relative to the camera
the exact position of the detector To nd the unit cell and crystal orientation, we must index the diffraction pattern this can be done by nding spots on one or more images
Image Display Simple control over: Found spots Predicted pattern Direct beam position Resolution limits Masking function Panning and Zooming
Indexing
If we know the main beam position on the image, we can count spots from the centre
b* l=0 (3,1,0)
a*
To do it properly, we need to put the spots into 3 dimensions, knowing the rotation of the crystal for this image
l=1
l=2
Back-project each spot on to Ewald sphere, then rotate back into zero- frame
Autoindexing Objective: to determine the unit cell, likely symmetry and orientation. (Note that intensities are required to nd the true symmetry, see later). The spot positions in a diffraction image are a distorted projection of the reciprocal lattice. Using the Ewald sphere construction, the observed reections (Xd, Yd, ) can be mapped back into reciprocal space giving a set of scattering vectors si.
D/r 1 s = X d /r Yd /r r = X +Y + D
2 d 2 d 2
Consider every possible direction in turn as a possible real-space axis, ie perpendicular to a reciprocal lattice plane. Project all observed vectors on to this axis
a
Fourier transform
1/a
Lattice plane normal to lattice plane: vectors cluster at lengths which are multiples of the lattice spacing. Fourier transform shows sharp peaks
Fourier transform
Pick three non-coplanar directions which have the largest peaks in the Fourier transforms to dene a lattice. This is not necessarily the simplest lattice (the reduced cell)
In the 2D example shown, the black cell corresponds to the reduced cell, while the red or blue cells may have been found in the autoindexing.
Autoindexing Window
A penalty is associated with each solution, which reects how well the determined cell obeys the constraints for that lattice type.
If nothing is known about the crystal, choose an initial solution in the following way: correct solutions usually have penalties < ~20, often < 10 and rarely > 30: also the errors ([x,y] & [] should be small) note where there is a sharp drop in the penalty in this case below solution 8. pick the solution with the highest symmetry with a penalty lower than the sharp drop, in this case, solution 7.
Note that the list of solutions given by Mosm are in fact all the same solution, with different lattice symmetries imposed, so that if the triclinic solution (number 1) is wrong, then all the others are too
Mosaicity Estimation
Predict pattern with increasing values for the mosaic spread (eg 0.0, 0.05, 0.1, 0.15 degrees). In each case, measure the total intensity of all predicted reections. The mosaicity can be estimated from the plot of total intensity vs mosaic spread.
1. Find spots what is a spot? should have uniform shape, not streak 2. Index nd lattice which ts spots 3. Estimate mosaicity improve estimate later 4. Check prediction, on images remote in (90 away) is the indexing correct? 5. Rene cell use two wedges at 90 6. Mask backstop shadow not done automatically by program 7. Integrate one (or few) image to check resolution etc 8. Integrate all images run in background for speed
Cell refinement
Cell refinement does not work well at low resolution (>~3) Just take values from indexing of several images
Parameter Renement
Generally, once an orientation matrix and cell parameters have been derived from the autoindexing procedures described, these parameters are rened further using different algorithms. Parameters to be rened: 1) Crystal parameters:
Cell dimensions, orientation, mosaic spread. 2) Detector parameters:
Detector position, orientation and (if appropriate) distortion parameters. 3) Beam parameters (possibly):
Orientation, beam divergence.
1) Using spot coordinates and a positional residual: 1 = i ix(Xicalc- Xiobs)2 + iy(Yicalc- Yiobs)2
2) Using spot position in and an angular residual: 2 = i i[(Ricalc- Riobs)/di* ]2 where Ricalc,Riobs are the calculated and observed distances of the reciprocal lattice point di* from the centre of the Ewald sphere (Post renement).
The positional residual gives no information about small errors in the crystal orientation around the spindle axis, or about the mosaic spread. The angular residual gives no information on the detector parameters (because it does not depend on spot positions).
Rene cell, orientation and mosaicity to minimise the angular residual ():
Consider a partially recorded reection spread over two images, with a recorded intensity I1 on the rst and I2 on the second. To determine the observed position, P from the fraction of the total intensity that is observed on the rst image , F = I1/(I1+I2), requires a model for the rocking curve, eg:
Knowing F and , R, the distance of P from the sphere, can be calculated, giving Robs. (The plus or minus sign depends on whether the rlp is entering or exiting the sphere).
1) Predicting reection positions Accuracy in prediction is crucial. Ideally, cell parameters should be known to better than 0.1%. Errors in prediction will introduce systematic errors in prole tting. Typically the detector parameters, crystal orientation and mosaic spread will be rened for every image during the integration. The cell parameters are not normally rened.
Integration window
Summation integration and Prole Fitting Summation integration: Sum the pixel values of all pixels in the peak area of the mask, and then subtract the sum of the background values calculated from the background plane for the same pixels. Prole tting: Assume that the shape or prole (in 2 or 3 dimensions) of the spots is known. Then determine the scale factor which, when applied to the known spot prole, gives the best t to be observed spot prole. This scale factor is then proportional to the prole tted intensity for the reection. Minimise: R = i (Xi - KPi)2 Xi is the background subtracted intensity at pixel i is the value of the standard prole at the corresponding pixel Pi i is a weight, derived from the expected variance of Xi K is the scale factor to be determined
Prole tting assumes that the spot shape is independent of the spot intensity. For non-saturated spots this is a valid assumption, in spite of the different appearance of strong and weak spots in the image.
All these spots are fully recorded, the weaker spots look smaller because the signal is lost in the background.
Determining the "Standard" Prole The proles are determined empirically (as the average of many spots). The spot shape varies according to position on the detector, and this must be allowed for (different programs do this in different ways).
Prole in centre
Prole at edge
Need to take precautions to avoid introducing systematic errors due to broadening proles during averaging. For each reection integrated, a new prole is calculated as a weighted mean of the standard proles for the adjacent regions. Prole tting is used for both fully recorded and partially recorded reections. Although this is strictly not valid, in practice it works well.
Standard Deviation Estimates For summation integration or prole tted partially recorded reections, a standard deviation can be obtained based on Poisson statistics. For prole tted intensities the goodness of t of the scaled standard prole to the true reection prole can be used for fully recorded reections. These will generally underestimate the true errors, and should be modied accordingly at the merging step (see later) so that they reect the actual differences between multiple (symmetry-related) measurements. It is important to get realistic estimates of the errors in the intensities.
Summary of the steps in data integration 1) Autoindexing, preferably using two orthogonal images, will give the crystal cell parameters, orientation and a suggestion of the lattice symmetry. Using this information an initial estimate of the mosaicity can be obtained. 2) Post renement requires the integration of a series of images, and uses the observed distribution of intensity of partially recorded reections over those images to rene the unit cell and mosaic spread. Best carried out prior to integration of the data set. 3) During integration of the entire data set, the cell parameters are normally xed, but the detector parameters, crystal orientation and mosaic spread are rened to ensure the best prediction of spot positions. 4) Intensities are estimated by both summation integration and prole tting, but generally the prole tted values are used for structure solution.
Strategy window
Total rotation range Ideally 180 (or 360 in P1 to get full anomalous data) Use programs (eg Mosm) to give you the smallest required range (eg 90 for orthorhombic, or 2 x 30) and the start point. Rotation/image: not necessarily 1! good values are often in range 0.25 - 0.5, minimize overlap and background Time/image depends on total time available Detector position: further away to reduce background and improve spot resolution
Two Cases:
Anomalous scattering, MAD
High redundancy is better than long exposures (eliminates outliers) Split time between all wavelengths, be cautious about radiation damage, reduce time & thus resolution if necessary Collect Bijvoet pairs close(ish) together in time: align along dyad or collect inverse-beam images Recollect rst part of data at end to assess radiation damage
Data for renement Maximise resolution: longer exposure time (but still beware of radiation damage) High multiplicity less important, but still useful Use two (or more) passes with different exposure times (ratio ~10) if necessary to extend range of intensities (high & low resolution) Short wavelength (<1) to minimise absorption Collect symmetry mates at different times and in different geometries, to get best average (even with higher Rmerge!). Rotate about different axes.
Decisions Select crystal Collect a few images to judge quality Decide strategy and collect all images
Integration Index choose lattice Rene unit cell Integrate Choose Laue group (point
group)
Scale & merge Convert I to F
How good is the dataset? Any bad bits? Is the crystal twinned?
By examining the diffraction pattern we can get a good idea of the likely space group. It is also useful to nd the likely symmetry as early as possible, since this affects the data collection strategy.
However, these restrictions may occur accidentally, or from pseudo-symmetry, so we need to score deviations between experimental cell dimensions and ideal values: for this we need estimates of the errors.Various penalty functions have been used.
2. Laue group symmetry (Patterson group) The Laue group is the symmetry of the diffraction pattern, so can be determined from the observed intensities. It corresponds to the space group without any translations, and with an added centre of symmetry from Friedels law. 3. Point group symmetry For chiral space groups (ie all macromolecular crystals), there is only one point group corresponding to each Laue group. It corresponds to the space group without any translations. 4. Space group symmetry Point group + translations (eg screw dyad rather than pure dyad). Only visible in diffraction pattern as systematic absences, usually along axes these are not very reliable indicators as there are few axial reections and there may be accidental absences.
CCP4i interface
General options
POINTLESS
This has b 3 a so can also be indexed on a hexagonal lattice, lattice point group P622 (P6/mmm), with the reindex operator: h/2+k/2, h/2-k/2, -l Conversely, a hexagonal lattice may be indexed as C222 in three distinct ways, so there is a 2 in 3 chance of the indexing program choosing the wrong one
A hexagonal lattice may be indexed as C222 in three distinct ways, so there is a 2 in 3 chance of the indexing program choosing the wrong one Hexagonal axes (black)
The distinction between the possibilities depends on the symmetry of the intensities, not the lattice symmetry
Likelihood
Nelmt 1 2 3 4 5 6 7 8 9 10
Lklhd 0.808 0.828 0.000 0.871 0.000 0.000 0.870 0.000 0.000 0.000
0.115 identity 0.141 *** 2-fold l ( 0 0 1) 0.527 2-fold ( 1-1 0) 0.100 *** 2-fold ( 2-1 0) 0.559 2-fold h ( 1 0 0) 0.562 2-fold ( 1 1 0) 0.087 *** 2-fold k ( 0 1 0) 0.540 2-fold (-1 2 0) 0.598 3-fold l ( 0 0 1) 0.582 6-fold l ( 0 0 1)
{-h,-k,+l} {-k,-h,-l} {+h,-h-k,-l} {+h+k,-k,-l} {+k,+h,-l} {-h,+h+k,-l} {-h-k,+k,-l} {-h-k,+h,+l} {+k,-h-k,+l} {-k,+h+k,+l} {+h+k,-h,+l}
Likelihood
Correlation coefcient & R-factor
Laue Group > > > > 1 2 3 4 5 6 7 8 9 10 11 =12 13 14 15 16 C C 1 C 1 P 1 m m m 2/m 1 2/m 1 2/m 1 P -1 C 1 2/m 1 C 1 2/m 1 C 1 2/m 1 C 1 2/m 1 P -3 C m m m C m m m P 6/m P -3 m 1 P -3 1 m P 6/m m m Lklhd *** 0.991 0.367 0.365 0.250 0.031 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 NetZc 6.00 5.00 4.55 4.88 4.27 2.45 1.62 0.60 0.57 0.75 2.60 0.94 0.83 0.72 -0.57 2.09 Zc+ 6.12 6.13 6.04 5.99 5.94 4.18 3.40 2.55 2.52 2.68 3.80 2.59 2.54 2.46 1.79 2.09 Zc0.12 1.13 1.49 1.11 1.67 1.73 1.79 1.95 1.96 1.93 1.20 1.65 1.70 1.74 2.36 0.00 CC 0.93 0.95 0.95 0.91 0.89 0.08 0.08 0.01 0.01 -0.02 0.44 0.26 0.24 0.24 0.10 0.25 CC0.02 0.17 0.22 0.17 0.25 0.26 0.27 0.29 0.29 0.29 0.18 0.25 0.26 0.26 0.35 0.00 Rmeas 0.12 0.10 0.09 0.14 0.12 0.54 0.56 0.56 0.53 0.60 0.38 0.42 0.45 0.45 0.52 0.44 R0.56 0.48 0.46 0.49 0.44 0.44 0.43 0.42 0.43 0.42 0.47 0.46 0.44 0.44 0.39 0.00
Systematic absences For a Pq axis along say c (index l), axial reections are only present if l = (p/q)n where n is an integer eg 21 31 41,43 42 61, 65 62, 64 63 2n 2,4,6, 3n 3,6,9, 4n 4,8,12, 2n 2,4,6, 6n 6,12,18, 3n 3,6,9, 2n 2,4,6,
BUT we may only have observed a few of the axial reections, so be careful
Probability of screw
Number 109
PeakHeight 0.878
SD 0.083
Reindex
Alternative indexing
If the true point group is lower symmetry than the lattice group, alternative valid but non-equivalent indexing schemes are possible, related by symmetry operators present in lattice group but not in point group (these are also the cases where merohedral twinning is possible) eg if in space group P3 there are 4 different schemes (h,k,l) or (-h,-k,l) or (k,h,-l) or (-k,-h,-l) For the rst crystal, you can choose any scheme For subsequent crystals, the autoindexing will randomly choose one setting, and we need to make it consistent: POINTLESS will do this for you by comparing the unmerged test data to a merged reference dataset
POINTLESS Consistent indexing to reference le (merged or unmerged) Example in space group H3 (R3 hexagonal setting)
Choices
What scaling model? the scaling model should reect the experiment considerations of scaling may affect design of experiment Is the dataset any good? should it be thrown away immediately? what is the real resolution? are there bits which should be discarded (bad images)?
Factors related to crystal and diffracted beam (e) Absorption in secondary beam - serious at long wavelength (including CuK), worth correcting for MAD data (f) radiation damage - serious on high brilliance sources. Not easily correctable unless small as the structure is changing Maybe extrapolate back to zero time? The relative B-factor is largely a correction for radiation damage
Factors related to the detector The detector should be properly calibrated for spatial distortion and sensitivity of response, and should be stable. Problems with this are difcult to detect from diffraction data. The useful area of the detector should be calibrated or told to the integration program Calibration should ag defective pixels and dead regions eg between tiles The user should tell the integration program about shadows from the beamstop, beamstop support or cryocooler (dene bad areas by circles, rectangles, arcs etc)
Determination of scales
What information do we have? Scales are determined by comparison of symmetry-related reections, ie by adjusting scale factors to get the best internal consistency of intensities. Note that we do not know the true intensities and an internally-consistent dataset is not necessarily correct. Systematic errors which are the same for symmetry-related reections will remain
Minimize = hl whl (Ihl - 1/khl<Ih>)2
Ihl lth intensity observation of reection h khl scale factor for Ihl <Ih> current estimate of Ih g(s) Absorption ...other factors ghl = 1/khl is a function of the parameters of the scaling model ghl = g( rotation/image number) . g(time) . Primary beam s0 B-factor
scale is smooth function of spindle rotation () or discontinuous function of image (batch) number (usually less appropriate)
Tim e
No AbsCorr AbsCorr
<I>/sd AbsCorr
No AbsCorr
Sample dataset: Rotating anode (RU200, Osmic mirrors, Mar345) Cu K (1.54) 100 images, 1, 5min/, resolution 1.8
How well are the scales determined? This depends on the strategy of data collection, thus affects the strategy Note that determination of scaling parameters depends on symmetryrelated observations having different scales. If all observations of a reection have the same value of the scale component, then there is no information about that component and it remain as a systematic error in the merged data (this may well be the case for absorption for instance) Thus to get intensities with the lowest absolute error, the symmetryrelated observations should be measured in as different way as possible (eg rotation about multiple axes). This will increase Rmerge, but improve the estimate of <I>. Conversely, to measure the most accurate differences for phasing (anomalous or dispersive), observations should be measured in as similar way as possible
0.5 0.5
0 10000 20000
0 10000 20000
Irms
Irms
Before
After
Minimises deviation of Sigma(scatter/) from 1.0 ie attens out the plot Makes average scatter 2 equal to average SD 2
What to look at? A. How well do equivalent observations agree with each other? 1. R-factors: traditional overall measures of quality (a) Rmerge (Rsym) = | Ihl - <Ih> | / | <Ih> | This is the traditional measure of agreement, but it increases with higher multiplicity even though the merged data is better (b) Rmeas = Rr.i.m.= (n/n-1) | Ihl - <Ih> | / | <Ih> | The multiplicity-weight R-factor allows for the improvement in data with higher multiplicity. This is particularly useful when comparing different possible point-groups (it is output by POINTLESS along with the correlation coefcient, as well as in SCALA) (c) Rp.i.m.= (1/n-1) | Ihl - <Ih> | / | <Ih> | Precision-indicating R-factor gets better (smaller) with increasing multiplicity, ie it estimates the precision of the merged <I>
Diederichs & Karplus, Nature Structural Biology, 4, 269-275 (1997) Weiss & Hilgenfeld, J.Appl.Cryst. 30, 203-205 (1997)
Resolution
B. Are some parts of the data bad? Analysis of Rmerge against batch number gives a very clear indication of problems local to some regions of the data. Perhaps something has gone wrong with the integration step, or there are some bad images
Here the beginning of the dataset is wrong due to problems in integration (Mosm)
A case of severe radiation damage: B-factor should be small (not more than -10, and even that is large)
-10
Outliers Detection of outliers is easiest if the multiplicity is high Removal of spots behind the backstop shadow does not work well at present: usually it rejects all the good ones, so tell Mosm where the backstop shadow is Scala also has facilities for omitting regions of the detector (rectangles and arcs of circles) Inspect the ROGUES le to see what is being rejected (at least occasionally)
The ROGUES file contains all rejected reflections (flag "*", "@" for I+- rejects, "#" for Emax rejects) TotFrc = total fraction, fulls (f) or partials (p) Flag I+ or I- for Bijvoet classes DelI/sd = (Ihl - Mn(I)others)/sqrt[sd(Ihl)**2 + sd(Mn(I))**2] h k l h k l Batch I sigI E TotFrc Flag Scale LP DelI/sd d(A) Xdet Ydet (measured) (unique) -2 -4 4 2 -2 2 -2 -4 0 0 0 0 2 2 2 2 2 0 2 0 2 0 2 0 Weighted 1220 1146 1148 1075 mean 24941 9400 27521 29967 27407 2756 2101 2972 2865 1.03 0.63 1.08 1.13 0.95p I0.99p *I+ 1.09p I0.92p I+ 2.434 3.017 2.882 2.706 0.031 0.032 0.032 0.032 -1.1 -6.7 0.0 1.1 30.40 30.40 30.40 30.40 1263.7 1266.4 1058.8 1060.9 1103.2 1123.3 1130.0 1106.6
Phi
outside reliable area of detector (eg behind shadow) specify backstop shadow, calibrate detector do not get ice on your crystal!
Rejects lie on ice rings (red) (ROGUEPLOT in Scala)
Ice rings
ice spots zingers bad prediction (spot not there) improve prediction spot overlap deconvolute overlaps nd single crystal
multiple lattices
D(observed)
Are the differences greater than would be expected from the errors? Test using a Normal Probability Plot: a slope > 1.0 means a signicant difference
-2
Peak
-4 -4
Edge Remote
-2 0 D(expected) 2 4
Centric, no anomalous
3.5
Resolution
3.5 Resolution
This can be used to set the useful resolution for nding anomalous scatterers
Correlated differences Ratio of distribution width along to width across diagonal ~= signal/noise
Uncorrelated native
3.5
Resolution
Theoretical curve N(Z) 0.6 0.4 0.2 Sigmoidal curve 0.2 0.4 0.6 0.8 Z=I/<I> Hypercentric curve
Too many (usually) weak reections because average <I> is inappropriate: Anisotropic diffraction <I> in resolution shells is wrong Translational NCS whole classes of reections are weak and should be compared to their own average, eg NCS ~ (1/2, 0, 0) makes h odd reections weak
Merohedral twinning (exact overlap of lattices) is possible if the true point group is lower symmetry than the lattice point group. Intensity statistics show too few weak reections (and too few strong ones)
apparent point group 422 a=64.3, c=198.8, dimer/ASU, 35kDa, 2.0 422 has no possibility of twinning, must be lower point group (4).
Structure solved on untwinned crystal by MAD Molecular replacement (difcult, large conformational change) Rened in CNS 1.1 with =0.5, P41
Another case: pseudo-merohedral twinning Unit cell: 79.2, 81.3 81.2 90, 90, 90 True space group: P212121 Pseudo-merohedral twinning into point-group P422 (twin operator k,h,-l) 79.2 = a b = 81.3 (not very close!) Solved by SeMet MAD at 3.1 resolution, ignoring twinning Model rened with 20% twinning in CNS at 2.6 resolution
on image
Cumulative intensity distribution (Acentric and centric) Acent_theor Acent_obser
60
Centric_theor Centric_obser
40
20
References:
Acknowledgements Andrew Leslie slides Mosm team: Present: Harry Powell mosm Luke Kontogiannis imosm Past: Geoff Battye imosm Pointless: Ralf Grosse-Kunstleve Kevin Cowtan Martyn Winn & CCP4 gang Peter Briggs Airlie McCoy
Andrew Leslie
cctbx clipper, simplex, C++ advice ccp4 libraries ccp4i C++ advice, code etc