Вы находитесь на странице: 1из 6

MENDELIAN POPULATIONS

Sept. 5, 2012

Broadly speaking, population genetics is the study of the origin, transmission, and distribution of
genetic variation within and among populations. Practical applications of population genetics
are extensive, and include such areas as genetic counseling, gene mapping, animal breeding, and
forensics. Every serious student of genetics should have a basic understanding of population
genetics. Like most sciences, population genetics relies on observational, experimental, and
theoretical approaches. Theoretical population genetics relies on mathematical models. These
models form a basis for understanding processes that cannot be otherwise observed. Although
the theory of population genetics is largely mathematical, we have made a concerted effort to
introduce topics in such a way that the assumptions can be understood and the main results
appreciated without much mathematics. The level of mathematics generally required is at the
level of high school or college algebra, including a bit of elementary statistics and probability.
Here we focus on how to quantify the amount of genetic variation in a population. Later we will
address the factors that influence the amount and distribution of genetic variation, including
random sampling (genetic drift), mutation, natural selection, and systems of mating.
Genetic Terminology
A population is a group of interbreeding individuals who exist together in time and space. In
practice, it does not usually refer to an entire species since population boundaries (typically
geographical or cultural) make it less likely for some groups of individuals to mate and
reproduce. In fact, it is often difficult to identify who does and does not belong to a given
population. In the words of Jim Crow, A real population includes individuals of all ages, some
dying, some being born, some choosing mates, some reproducing, some migrating, some simply
growing old. Fortunately, most questions in population genetics do not require such detail. Our
operational definition of a population is generally sufficient to derive the relationships between
genetic processes and outcomes. With these considerations, it is useful to think of populations as
groups of individuals who share a common gene pool. The gene pool is simply the aggregate of
all genes in a population without reference to the individuals who carry them. In the gene-pool
model, random mating of diploid individuals is equivalent to random sampling from a pool of
genes.
Some of the terminology commonly used in genetics today is inconsistent. For example, the
distinctions between the terms gene, locus, and allele are not always clear. One of the primary
reasons for this is that many genetic terms were introduced long before the discovery of DNA
and knowledge of its structure and function. For the sake of clarity, we will make use of the
following definitions. A gene is a nucleotide sequence that encodes information for a specific
product. A locus is defined as a specific position in the genome. Alleles refer to alternative
nucleotide sequences at a locus. With this convention, a diploid individual has two alleles at
each autosomal locus, one from his/her mother and the other from his/her father.
Genotype and Allele Frequencies
One of the most fundamental quantities in population genetics is the allele frequency. The
primary reason for this is that genotypes are not passed from one generation to the next intact,

but rather are broken apart by meiosis and re-constructed by mating. A secondary reason, albeit
less important, is that there are many more potential genotypes than alleles at a locus. For
example, there are 3 possible genotypes for a 2-allele locus, 6 possible genotypes for a 3-allele
locus, and so on. To convince you of this, we leave it as an exercise to show that the number of
m(m + 1)
possible genotypes for a single locus with m alleles equals
.
2

Allele frequencies are usually estimated from observations on phenotypes or genotypes. With
co-dominant alleles, the number of copies of each allele in a sample can be counted directly.
This process is illustrated in Tables 1a and 1b for the Taq1A restriction site polymorphism in the
DRD2 dopamine receptor gene. For this locus, there are three genotypes, A1A1, A1A2, and A2A2.
Among a sample of 459 members of a Southwestern American Indian tribe, the observed
numbers of individuals with genotypes A1A1, A1A2, and A2A2 were 165, 211, and 83,
respectively.
Table 1a. Genotype frequencies for the DRD2 Taq1A polymorphism in a
Southwestern American Indian tribe (Goldman et al. 1997)
Genotype
Number
Frequency

A1A1
165
0.359

A1A2
211
0.460

A2A2
83
0.181

Total
459
1.0

In this sample of 459 individuals, there are a total of 918 alleles. Each of the A1A1 homozygotes
carries two copies of the A1 allele, and each of the A1A2 heterozygotes carries one copy of the A1
allele. Therefore, there are 2(165)+211=541 copies of the A1 allele, and the frequency of the A1
allele is 541/918=0.589. Similarly, there are 211+2(83)=377 copies of the A2 allele, and the
frequency of the A2 allele is 377/918=0.411. As expected, the sum of the allele frequencies is
1.0.
Table 1b. Allele frequencies for DRD2 Taq1A
Allele
Number
Frequency

A1
541
0.589

A2
377
0.411

Total
918
1.0

Population genetics is very quantitative, and as such, relies on a certain degree of abstraction.
Here we introduce the notation for allele and genotype numbers and frequencies. We start with
the single 2-allele locus given in Table 2a. By convention, ns indicate numbers (or counts) and
ps indicate frequencies (or probabilities). Double subscripts indicate genotypes and single
subscripts indicate alleles.

Table 2a. General notation for genotype frequencies at a two-allele locus


Genotype
Number
Frequency

A1A1
N11
N
P11 = 11
N

A1A2
N12
N
P12 = 12
N

A2A2
N22
N
P22 = 22
N

Total
N = N11 + N12 + N 22
1.0

Table 2b. Allele frequencies by direct allele counting


Allele
Number
Frequency

A1
n1 = 2 N11 + N12
n
p1 = 1
2N

A2
n2 = N12 + 2 N 22
n
p2 = 2
2N

Total
n1 + n2 = 2 N

1.0

In this notation, the frequency of allele A1 is given by


p1 =

n1
2 N11 + N12
1
=
= P11 + P12 .
2 N 2( N11 + N12 + N 22 )
2

This notation is easily extended to a locus with multiple alleles. To see this, let Pii denote the
frequency of genotype AiAi and Pij denote the frequency of genotype AiAj. Then the frequency
of allele Ai is given by
n

pi =

ni
=
2N

2 N ii + N ij
j =1

2N

= Pii +

1 n
Pij ,
2 j =1

where n is the number of alleles at the locus and (in the summation) ji.
A Note About Parameters and Estimates

So far, we have not made a distinction between populations and samples and parameters and
estimates. In general, we make inferences about populations by collecting samples that are
thought to be representative of the population. A parameter describes a quantity in the entire
population. In the DRD2 dopamine receptor gene example above, the parameter of interest is the
allele frequency of A1 (or equivalently, A2) in the entire Southwestern American Indian
population. Since it is impractical to sample the whole population, an estimate of the A1 allele
frequency is made by randomly sampling a subset of the population, in this case 459 individuals.
In practice, the distinction between a parameter and an estimate is an important one because
different samples may yield different values of the estimate due to sampling variation. If p
represents the true value of a parameter in the population, the estimate obtained from a sample is
usually designated p . However, in this course, the distinction should be clear by the context.

Measures of Allele Frequency Variation

In order to document the amount of genetic variation in a population, it is desirable to define


what constitutes an appreciable allele frequency. To do so, it is common practice to set an
arbitrary limit for the frequency of the most common allele. Most people would agree that a
polymorphic locus is one for which the frequency of the most common allele is less than 0.95 or
0.99. This implies that the locus has at least two alleles. When every copy of the locus in the
population has the same allele, we say the locus is monomorphic. In principle, strict
monomorphism is unlikely to be common since mutation provides a continual supply of new
alleles. Moreover, it is very difficult to prove that a locus is monomorphic since our knowledge
of populations is obtained from samples. For example, if the alleles at a locus are all very rare,
most will not be observed in even large samples.
Geneticists use several measures to quantify genetic variation. The choice of which measure to
use is usually dictated by the nature of the problem at hand. The amount of heterozygosity in a
population is the most used measure of variation and is desirable for a number of reasons, but
several other measures are used as well. Heterozygosity (H) is the proportion of diploid
genotypes composed of two different alleles. Because individuals are either heterozygous or
homozygous at a given locus, this measure represents a biologically useful quantity. To illustrate
this concept, consider the data for the DRD2 Taq1A locus above. Since 211 of the 459
individuals are heterozygous, the heterozygosity or proportion of heterozygotes for this locus is
H=211/459=0.460. If the heterozygosity is known for a number of loci, then these values can be
1 n
averaged over all loci to obtain an average heterozygosity, denoted H = H l .
n l =1
In a randomly mating population of diploid individuals, heterozygosity is also referred to as gene
diversity (Nei, 1987) and can be thought of as the probability of obtaining different alleles when
you draw two copies of a locus at random from the gene pool. If pi denotes the frequency of
allele Ai at a single locus, then the frequency of the AiAi homozygote is pi2. Thus, the gene
diversity of the locus is
m

H e = 1 pi 2 .
i =1

Gene diversity is typically denoted by He because it is equivalent to the expected heterozygosity


in a randomly mating population, a topic that we will cover in the next section below. Note that
the complement of gene diversity is called gene identity and is given by
m

1 H e = pi 2
i =1

It is easily proved that maximum gene diversity is obtained when all alleles are of equal
frequency. For example, we leave it as an exercise to show that the maximum number of
heterozygotes for a 2-allele locus occurs when p1=0.5.

Finally, the effective number of alleles, denoted by ne, is the number of equally frequent alleles
that would be required to produce a given gene diversity and is the inverse of the gene identity,
in other words,
ne =

1
=
1 He

1
m

p
i =1

It turns out that most of the time in population genetics we are more interested in the effective
number of alleles than the actual number of alleles since many of the alleles in a population may
be represented only or twice and will thus contribute very little to the genetic variation. For
example, consider the allele frequencies at a single locus in the two hypothetical populations in
Table 3. Based on the number of alleles, population I, which has four different alleles, looks
more variable than population II, which has three different alleles. However, most of the alleles
in population I are of type A4 so that most members will have the A4A4 genotype. By contrast,
the three alleles in population II are about equally frequent so that all six genotypes will reach
appreciable frequencies. Note that unlike the actual number of alleles, the number of effective
alleles is not necessarily a whole number. Deeper reasons justify the use of the effective number
of alleles in population genetics theory, which we will re-visit in a later lecture.
Table 3. Allele frequencies and diversity
Population
I
II

A1
0.01
0

A2
0.03
0.33

A3
0.02
0.34

A4
0.94
0.33

m
4
3

ne
1.13
3.00

He
0.115
0.667

Study Questions

1. In your own words, what is a population (in the genetic sense) and why is it important to
define?
2. What is the difference between a parameter and an estimate and why does the distinction
matter?
3. A mutation at a previously monomorphic locus is acquired by a single individual. Is the
locus polymorphic? Why or why not?

Problem Set

1. In humans, the sex ratio of newborn infants is about 100:105 (females:males). (a) What
is the probability of having a girl? (b) What is the probability that a family has three
consecutive girls? (c) What is the probability that a family has two girls and two boys?
(d) If the family has three children, what is the probability that all three children will be
girls given that two of the children are girls?
2. Show that there are m(m+1)/2 possible genotypes at a locus with m alleles.

3. Show that maximum gene diversity occurs for a 2-allele locus when the alleles are
equally frequent.
4. Consider a 32-base-pair indel found in the human chemokine receptor gene CCR5.
Genotypes that are homozygous for the CCR5-32 deletion are strongly resistant to
infection by HIV-1. The table below gives the numbers of individuals with each
genotype in a sample of 294 Parisians (Lucotte and Mercier, 1998), where + and 32
denote the non-deletion and deletion alleles, respectively. Determine the non-deletion
and deletion allele frequencies.

Genotype
Number

Genotypes for CCR5-32 deletion


+/+
32/32
+/32
224
64
6

Total
294

5. Determine the effective number of alleles for the data in problem #4. Interpret your
result.

Selected Answers

1.
2.
3.
4.
5.

0.488; 0.116; 0.375; 0.241


Hint: add up the number of homozygous and heterozygous genotypes
Hint: take the first derivative of the heterozygosity
P(+)=0.87; P(32)=0.13
H=0.23; ne=1.29

Вам также может понравиться