Академический Документы
Профессиональный Документы
Культура Документы
Overview
DNA Analyst Bob Blackett has graciously provided The Biology Project with sample data from his own work. In this activity, you will learn the concepts and techniques behind DNA profiling of the 13 core CODIS "Short Tandem Repeat" loci used for the national DNA databank. You will then have the opportunity to collect and interpret actual STR data, and to answer one or more of the following questions: 1. How is STR data used in a DNA Paternity Test? 2. How can STR data from close relatives be used to create a genetic profile of a missing person? 3. How much genetic diversity exists among siblings? 4. How does one calculate the probability for a specific DNA profile? Alternatively, you may wish to create your own activities, based on some suggestions for open-ended inquiry that are offered below. This activity is aimed at students with a basic knowledge of DNA structure, Mendelian genetics, and human pedigree analysis. A good preparation for this activity would be to review our problem sets and tutorials in Human Biology.
Create a Blackett Family Pedigree Collecting STR DNA profile data Paternity Testing with STR Data DNA Profile of a "Missing Person" DNA Profile Frequency Calculations
D7S280
D7S280 is one of the 13 core CODIS STR genetic loci. This DNA is found on human chromosome 7. The DNA sequence of a representative allele of this locus is shown below. This sequence comes from GenBank, a public DNA database. The tetrameric repeat sequence of D7S280 is "gata". Different alleles of this locus have from 6 to 15 tandem repeats of the "gata" sequence. How many tetrameric repeats are present in the DNA sequence shown below? Notice that one of the tetrameric sequences is "gaca", rather than "gata".
1 aatttttgta ttttttttag agacggggtt tcaccatgtt ggtcaggctg actatggagt 61 tattttaagg ttaatatata taaagggtat gatagaacac ttgtcatagt ttagaacgaa 121 ctaacgatag atagatagat agatagatag atagatagat agatagatag atagacagat 181 tgatagtttt tttttatctc actaaatagt ctatagtaaa catttaatta ccaatatttg 241 gtgcaattct gtcaatgagg ataaatgtgg aatcgttata attcttaaga atatatattc 301 cctctgagtt tttgatacct cagattttaa ggcc
The CODIS system has been widely adopted by forensic DNA analysts STR alleles can be rapidly determined using commercially available kits. STR alleles are discrete, and behave according to known principles of population genetics The data are digital, and therefore ideally suited for computer databases Laboratories worldwide are contributing to the analysis of STR allele frequency in different human populations STR profiles can be determined with very small amounts of DNA
D8S1179 D21S11 D18S51 D5S818 12, 13 9.9% THO1 9, 9.3 9.6% 29, 31 2.3% TPOX 8, 8 3.52% 12, 13 4.3% CSF1PO 11, 11 7.2% 11, 13 13% AMEL XY (Male)
For each genetic locus, Bob has determined his "genotype", and the expected frequency of his genotype at each locus in a representative population sample. For example, at the genetic locus known as "D3S1358", Bob has the genotype of "15, 18". This genotype is shared by about 8.2% of the population. By
combining the frequency information for all 13 CODIS loci, Bob can calculate that the frequency of his profile would be 1 in 7.7 quadrillion Caucasians (1 in 7.7 times 10 to the 15th power! In Bob's forensic DNA work, he often compares the DNA profile of biological evidence from a crime scene with a known reference sample from a victim or suspect. If any two samples have matching genotypes at all 13 CODIS loci, it is a virtual certainty that the two DNA samples came from the same individual (or an identical twin).
In the partial results shown above, the three STRs D3S1358, vWA, and FGA are being analyzed simultaneously. The lengths of the amplified DNAs are shown by the scale from 100 bp to 280 bp at the top of the figure. The middle panels with multiple peaks are reference standards with the known alleles for each STR locus. Notice that the alleles for the three different loci do not overlap. The lower panel shows the alleles for Bob Blackett's mother Norma for the D3S1358, vWA, and FGA loci. Norma's alleles have been compared by
computer to the refrence standards, and labeled. To interpret this result, Norma's genotype is 15, 15 at the locus D3S1358, 14, 16 at vWA, and 24, 25 at FGA. 3. Detection of DNAs after PCR Amplification The PCR primers in the commercial kits used for STR analysis have fluorescent molecules covalently linked to the primer. To extend the number of different loci that can be analyzed in a single PCR reaction, multiple sets of primers with different "color" fluorescent labels are used. Following the PCR reaction, internal DNA length standards are added to the reaction mixture and the DNAs are separated by length in a capillary gel electrophoresis machine. As DNA peaks elute from the gel they are detected with laser activation. The sequencing machines used for allele separation and detection are the same type currently being used in the Human Genome Sequencing project, with digital output that can be analyzed by special computer software.
In the AmpFLSTR Profiler Plus PCR Amplification Kit from Applied Biosystems used by Bob Blackett, 9 STRs are analyzed by using three sets of primers. Each set has a different colored fluorescent label. In the figure above, three sets of STRs are represented by blue, three by green, and three by yellow (shown as black) fluourescent peaks. The red peaks are the DNA size standards. Special computer software is used to display the different colors as separate panels of data and determine the exact length of the DNAs. A tenth marker called AMEL is used to distinguish male DNA as X, Y or female DNA as X, X. A second kit, called Cofiler Plus, is used in a second PCR reaction to ammplify 4 additional STR loci, plus repeat some of the loci from the Profiler Kit. The result from 2 PCR reactions is the analysis of the entire CODIS set of 13 STRs, with overlap of some loci, and a test for the sex chromosomes. The results are obtained as discrete, digital alleles determined from the exact size of the amplified products compared to known standards.
Allele. The different forms of a gene. Different STR repeat lengths represent different alleles at a genetic locus, i.e. 8 and 9 are different alleles of the THO1 locus. Locus. The position on a specific chromosome where the different alleles of a genetic marker are located. The plural is loci. Monohybrid Cross. Genetic cross involving parents differing in only one trait. Inheritance of each of the 13 STR loci can be treated as a separate Monohybrid Cross. Genotype. The genetic composition of the alleles at a locus. Since we are diploid, we each have two alleles at each locus. Homozygous. Both alleles at a locus are the same, i.e. Fred has a genotype of 29, 29 at the D21S11 locus. Heterozygous. Alleles at a locus are not the same, i.e. Normal has a genotype of 29, 31 at the D21S11 locus. Multiple Allelic Series. Many different alleles at a locus, i.e. the known alleles at the vWA locus are 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, and 21. Punnett Square. A diagram used to determine all possible genotypes that can occur in a genetic cross. All of the diagrams on this page are Punnett Squares.
Here are some examples of the how STR data can be interpreted in a family DNA study. The numbers outside the Punnett Squares are the parental alles that can be present in the egg or sperm of the parents. The numbers inside the squares are the genotypes possible for the resulting children.
Case 1
If the genotypes of both parents are known, we use a Punnett Square to predict the possible phenotypes of their offspring. Each child inherits one allele of a given locus from each parent. Panel (a) - At the D21S11 locus, the children of Bob Blackett and wife Anne can have four different genotypes. Son David is 28, 31. Daughter Katie is 29, 30. Panel (b) - Bob Blackett inherited the 31 allele
from his mother, Norma. Therefore the 29 allele is paternal. If Bob's paternal was not 29, what would be your conclusion?
Case 2
In the genotypes of a mother and several children are known, it is often possible to unambiguously predict the genotype of the father. In this case, Karen is the mother with a genotype of 9, 9.3 at the THO1 locus. From the Punnett Square we can determine that the paternal alleles of Tiffany, Melissa, and Amanda are 8, 9.3, and 9.3, respectively. Therefore, their father Steve must have a genotype of 8, 9.3. If the three daughters had three different paternal alleles, what would be your conclusion?
Case 3
Sometimes only one allele of the father can be predicted when the genotypes of a mother and several children are know. In this example, the genotype of Karen, the mother, is 16, 17 at the D18S51 locus. The genotypes of the daughters are either 16, 18 or 17, 18. In each case, Melissa, Tiffany, and Amanda inherited the 18 allele from their father, Steve. We cannot determine if the genotype of Steve is homozygous, 18, 18 or 18, ? where the ? means any other allele.
Case 4
Is it possible to determine parental genotypes when only the genotypes of their children are known? Consider the case of Bob Blackett's 4 first coursins, Marilyn, Buddy, Dick and Janet. Bob did not have DNA samples for their parents, Bud and Louise, who are both deceased. In a real forensic case, Bud and Louise might represent "missing persons". In panel (a) we can arrange the 3 known genotypes of the 4 children. In panel (b) we predict the only two paternal genotypes for the parents that can account for the children. Note that we cannot determine which genotype goes with which parent.
Case 5
A variation on Case 4 is when there are only two genotypes known for the children, and both parental genotypes must be predicted. Panel (a) - Marilyn and Janet are 15, 16 at the locus D3S1358. Buddy and Dick are 18, 18. Panel (b) - The only parental genotypes that can give this result are 15, 18 and 16, 18. Once again, we cannot predict which parent as which genotype.
Case 6
Sometimes the parental genotypes cannot be predicted unambigously from the genotypes of their children. Marilyn is 16, 17 at the locus vWA. Buddy, Dick, and Janet are 16, 18. What are the parental genotypes? Panel (a) - One interpretations is that the parents are 16, 18 and 16, 17. Panel (b) - Another possibility is that one parent is 17, 18 and the other is 16, ?, where ? is any allele.
Part of the work of forensic DNA analysis is the creation of population databases for the STR loci studied. Probability calculations are based on knowing allele frequencies for each STR locus for a representative human population (and showning HardyWeinberg equilibrium for the population by statistical tests). Allele frequency is defined as the number of copies of the allele in a population divided by the sum of all alleles in a population. For a heterozygous individual, if the two alleles have frequencies of p and q in a population, the probability (P) of an individual of having both alleles at a single locus is P = 2pq
If an individual is homozygous for an allele with a frequency of p, the probability (P) of the genotype is P = p2.
We saw earlier that Bob Blackett has the genotype 15, 18 at the locus D3S1358. In a reference database of 200 U.S. Caucasians, the frequency of the alleles 15 and 18 was 0.2825 and 0.1450, respectively. The frequency of the 15, 18 genotype is therefore P = 2 (0.2825) (0.1450) = .0819, or 8.2%.
If databases of allele frequency for different loci can be shown to be independently inherited by appropriate statistical tests, the probability for the combined genotype can be determined by the multiplication (product rule). The probability (P) for a DNA profile is the product of the probability (P1, P2, ... Pn) for each individual locus, i.e. Profile Probability = (P1) (P2) ... (Pn)
The probability can be an extremely low numbers when all 13 CODIS STR markers are included in the DNA profile. As mentioned earlier, Bob Blackett calculated his own profile probability at 1.3 times 10-16, or no more frequent than 1 in 7.7 quadrillion individuals (7.7 million billion), which is more than a million times the population of the planet.
Melissa Daughter of Karen and Steve Amanda Daughter of Karen and Steve Louise Bud Buddy Dick Janet Sister of Fred; Bob's Aunt Husband of Louise Son of Bud and Louise Son of Bud and Louise Daughter of Bud and Louise
View the Pedigree in a new web page Download the pdf version. The pdf version, which requires Adobe Acrobat reader for display and printing, might be useful for taking notes during the ensuing activities.
Collecting STR DNA profile data STR Data for the Blackett Family
These data are from the actual DNA analysis of the Blackett family members by Bob Blackett. The tracings below show the genotypes for three of the 13 CODIS STR loci. In this activity, you will record the data for use in the ensuing genetic analysis of the Blackett family. Data on the other 10 loci will be provided later.
Collect the data for Bob, Anne, David, Katie, Fred and Norma for the "Paternity Testing with STR" Activity. Collect the data for Karen, Tiffany, Melissa, and Amanda for the "DNA Profile of a Missing Person" Activity. You will not need to collect the results for Buddy, Dick, Marilyn and Janet. They are provided for you to create your own activity, i.e. Can you make any conclusions about Louise and Bud?
You may wish to collect data in your own databook. If you would like to use partially completed spreadsheets to speed up your data collection, select from the following shortcuts:
Bob, Anne, David, Katie, Fred and Norma either in Table or pdf format. Karen, Tiffany, Melissa, and Amanda either in Table or pdf format. Completed data for Buddy, Dick, Marilyn, and Janet in Table or pdf format.
Note: In combining all of the individual profiles into a composite diagram for this activity, the tracings were digitally modifed for illustrative purposes.
Go immediately to the questions below and interpret the data you have already collected. Review the principles of genetics needed for this activity Use the data that we have collected for you or, if you prefer, use the data that you have already collected. Download a worksheet for this activity in PDF format with the data that we have collected for you.
Choose from among the following questions to test your understanding of human genetics. 1. Who are the parents of David and Katie? Do all of the data you have collected on the genotypes of Bob, Anne, Katie, and David support the conclusion that Bob and Anne are the biological parents of David and Katie? You should justify your answer by reference to the specific genotypes for the STR loci. 2. What is the genetic legacy of Fred and Norma? The alleles that Bob passes on to his children have in turn been inherited from Bob's parents, Fred and Norma. Identify the alleles among the 13 CODIS STR loci in the genotypes of Katie and David that have been unambigously inherited from each of their paternal grandparents. Now identify any additional alleles that might have been inherited from their paternal grandparents. 3. Genetic Diversity and Sexual Reproduction Human geneticists are often asked why children have not inherited a particular trait from their parents. As a human geneticist, you know that one mechanism to insure genetic diversity is the independent assortment of alleles of different loci during gamete (egg and sperm) production, i.e. Mendel's Second Law of Genetics. To illustrate this important genetic principle, calculate how many genotypes would be possible among the children of Bob and Anne for the combined DNA profile from the D3S1358, vWA, and FGA. If you feel really ambitious, now calculate the possible genotypes of the children of Bob and Anne for all 13 CODIS STR loci. 4. How many genotypes are possible in a population for a three locus DNA Profile?
If there are two alleles, A and B, at a genetic locus in a population, there are three possible genotypes, namely AA, BB, and AB. If there are three alleles, A or B or C, there are six possible genotypes, namely AA, BB, CC, AB, AC, and BC. For N different alleles, the total possible genotypes is given by the following expression:
If we assume that the allele reference ladders from our data collection exercise represent all possible alleles (a conservative estimate), how many genotypes are possible in a population for the combined STR loci of D3S1358, vWA, and FGA? 5. How many genotypes are possible in a population for the combined CODIS 13 STR loci? If you feel really ambitious, you may wish to calculate the number of possible genotypes considering all 13 CODIS STR markers. The table below shows the number of alleles for each locus. Beware, the number will be very large. Locus D3S1358 vWA FGA D8S1179 D21S11 D18S51 D5S818 Alleles Locus Alleles 8 11 14 12 THO1 7 22 TPOX 8 21 CSF1PO 10 10 AMEL XY
Go immediately to the questions below and interpret the data you have already collected. Review the principles of genetics needed for this activity Skip the data collection, and Use the data that we have collected for you for Question #1. Download a worksheet for this activity in PDF format with the data that we have collected for you for Question #1. Completed data for Buddy, Dick, Marilyn, and Janet in Table or pdf format for Question #2.
1. What is Steve's Genotype? In our activity, we obtained data for Karen and her three daughters, Tiffany, Melissa, and Amanda. Bob Blackett has not yet had the opportunity to test the DNA of Steve, so Steve can play the role of the "missing person" in our activity. Determine Steve's genotype at the 13 CODIS STR loci. Indicate whether there is an unambigous genotype where both alleles are know, or some uncertainty about both paternal alleles. 2. What are the Genotypes of Bud and Louise? What happens when we have two missing people? Human geneticists are often asked to determine if adult children in the same family all have the same biological parents. Demonstrate that all of the genetic information for the children of Bud and Louise is consistent with all 4 having the same two parents.
Allele Frequency 12 14 15 16 17 18 19 20 0.015 0.1311 0.1189 0.186 0.2774 0.189 0.0884 0.015
Allele Frequency 18 19 20 21 22 23 24 25 0.015 0.061 0.125 0.1799 0.2287 0.1311 0.1463 0.0945
FGA FGA
26 27
0.0183 0.015
1. What is the Probability for a 3-Locus DNA profile? Based on a population database of Caucasians developed by Bob Blackett and colleagues in Arizona, Bob can calculate the genotype frequency of his combined profile for the three STR loci D3S1358, vWA, and FGA to be 6 x 10-5. Compare this frequency with the frequency you calculate from the Royal Canadian Mounted Police data. For help with this calculation, review the DNA Profile Frequency Calculation page. 2. Genotype Frequency for the 13 CODIS STR Loci. If you feel really ambitious, retrieve additional frequency data for the other 10 STR Loci from the web site of the Royal Canadian Mounted Police and calculate the genotype frequency for all 13 STR loci for one or more individuals from this study. 3. Check your answers. As an alternative to doing all of the arithmetic yourself, you can Calculate a profile's Random Match Probability using the RCMP on-line calculator.