Вы находитесь на странице: 1из 2

The theory developed in the preceding chapter can be used to calculate the rate of nucleotide substitution, which is

one of the most basic quantities in the study of molecular evolution. Indeed, in order to characterize the evolution of
a DNA sequence, the first thing we need to know is how fast it evolves. It is also interesting to compare the substitution
rates among genes or among different DNA regions, because this can help us understand the mechanisms responsible
for the different rates of nucleotide substitution during evolution.

In this chapter, we present data on the rates and patterns of nucleotide substitution and discuss three factors affecting
them: (1) functional constraint, (2) positive selection, and (3) mutational input. We also dissect the substitution rate
into its constituent parts in order to infer the pattern of substitution, in particular the pattern of spontaneous mutation.
Knowing the rate of nucleotide substitution may also enable us to date evolutionary events, such as the divergence
between species or higher taxa. This raises the issue of how variable the rate is among different evolutionary lineages.
We investigate this variation and attempt to identify the factors affecting it. The rates of evolution of nuclear, organelle,
and RNA genomes are also examined.
The rate of nucleotide substitution, r, is defined as the number of substitutions per site per year. The mean rate of
substitution can be calculated by dividing the number of substitutions, K, between two homologous sequences by 2T,
where T is the time of divergence between the two sequences (Figure 4.1). That is,

T is assumed to be the same as the time of divergence between the two species from which the two sequences were
taken, and is usually inferred from paleontological and biogeographical data. Equation 4.1 only holds when dealing
with distantly related species. When dealing with closely related species, such as humans and chimpanzees, we must
take into account the allelic divergence (polymorphism) that occurred within the ancestral population prior to the
divergence event (e.g., Takahata and Satta 1997).

In this section we shall deal with the issue of rate variation among genes and among different regions in a
gene. For this purpose, it is advisable to use the same species pair for all the genes under consideration. The
reason is twofold. First, there are usually considerable uncertainties about paleontological estimates of
divergence times. By using the same pair of species we can compare rates of substitution among genes
without knowledge of the divergence time. Second, the rate of substitution may vary considerably among
lineages. In this case, differences in rates between two genes may be due to differences between lineages
rather than to differences that are attributable to the genes themselves.

Obtaining a reliable estimate of the rate of nucleotide substitution requires that the degree of sequence
divergence be neither too small nor too large. If it is too small, the rate estimate will be influenced by large
chance effects, whereas if it is too large, the estimate may be unreliable due to the difficulties in correcting
for multiple substitutions at the same site (Chapter 3).

Coding regions
The protein-coding regions of genes have attracted the most attention from both molecular and evolutionary
biologists because of their functional and medical importance. As a consequence, a large amount of
sequence data has become available for these regions, and many comparative studies of nucleotide
sequences in these regions have been published (e.g., Ohta 1995). In dealing with protein-coding sequences,
it is important to discriminate between nucleotide changes that affect the primary structure of the encoded
protein, i.e., nonsynonymous substitutions, and changes that do not affect the protein, i.e., synonymous
changes.

In Table 4.1 we list the rates of synonymous and nonsynonymous substitution for 47 protein-coding genes.
The genes in each group are arranged in order of increasing rate of nonsynonymous substitution. The rates
were obtained from comparisons between human and murid (rat or mouse) homologous genes, by using the
method of Li (1993) and Pamilo and Bianchi (1993), and by setting the time for the human-murid
divergence event at 80 million years ago. We note that the rate of nonsynonymous substitution is extremely
variable among genes. It ranges from effectively zero in actin cc to about 3.1 x 10-9
substitutions per nonsynonymous site per year in interferon y. Nonsynonymous nucleotide substitutions
are, of course, reflected in the rates of protein evolution, which may vary by as much as three orders of
magnitude. As is well known, certain proteins (e.g., histones, actins, and ribosomal proteins) are extremely
conservative. A very extreme case is ubiquitin, which is completely conserved between human and
Drosophila, and which differs among animals, plants, and fungi by only 2 or 3 out of 76 amino acid residues.
Some peptide hormones (e.g., somatostatin-28, glucagon, and insulin) are also extremely conservative, but
others have evolved at intermediate rates (e.g., parathyroid hormone and erythropoietin) or at high rates
(e.g., relaxin). The insulin C-peptide has often been used as an example of rapid evolution, but it actually
evolves considerably more slowly than relaxin. Hemoglobins, myoglobin, and some carbonic anhydrases
have evolved at intermediate rates, while apolipoproteins, immunoglobulins, interferons and interleukins
have evolved very rapidly. Apolipoprotein B is a huge protein (4,536 amino acids) and the relatively high
nonsynonymous rate in the region included in Table 4.1, which is the most conservative part of the protein,
implies that the whole

Вам также может понравиться