Вы находитесь на странице: 1из 5

Estimating an Age of Splice-site Mutation Causing

Hypotrichosis with Juvenile Macular Dystrophy


Disease in Pakistani Population
Adeel Ahmed1, Khalid Saleem2
1 2
` Department of Computer Science
Quaid-i-Azam University
Islamabad, Pakistan
1
2
aahmedqau@gmail.com, ksaleem@qau.edu.pk

AbstractThe splice-site mutation at the locus on human


chromosome 16q22.1 causes hypotrichosis with juvenile macular
dystrophy (HJMD) disease in Pakistani population [1]. Different
statistical methods have been used to estimate the age of
mutations. We have analyzed a CDH3 gene dataset which consists
of genotypes of the Pakistani family in order to compare two
methods for estimating an age of IVS10-1 G T mutation that
occurrs in some affected members of a family. The first method is
DMLE+ [2] that is a genealogy based method and estimate the
mutation age with population growth rate and genotype data. A
second method [3] that we used is based on allele frequency. We
have found that the mutation age vary with population size,
growth rate and mutant allele frequency. Estimates of IVS10-1
GT mutation based on DMLE+ [2] are 232, 238 and 226
generations with three simulated runs and 95% credible interval,
respectively. A method [3] gives an estimated time of 138
generations in units of 2N generations.
Keywords-Gene genealogy; Mutation; Population genetics;
Statistical computing; Linkage disequilibrium

I.

INTRODUCTION

The age of mutation is the time of the origin of mutation


from its most recent common ancestor (MRCA). Estimating
an age of mutation that occurs in autosomes is slightly
different as it involves some stochastic processes as compared
to estimating an age using X-chromosome or Y-chromosome.
The age of mutation can be estimated based on genealogy and
from the allele frequency. Population genetics provides us a
way to study the distribution and change of allele frequency
under the certain evolutionary processes like natural selection,
genetic drift, mutation and gene flow.
Hypotrichosis with juvenile macular dystophy (HJMD;
OMIM 601553) is a rare autosomal recessive disorder in
CDH3 (OMIM 114021) gene in consanguineous Pakistani
family causes hair loss and eye blindness [1]. This disorder
occurs due to splice-site mutation. The affected individuals
revealed a homozygous recessive transversion mutation
(IVS10-1 G T). In CDH3 gene, a mutation occurs due to
genetic drift and resides in intron 10 near the splice acceptor
site of exon 11 on chromosome 16q22.1 [1].

978-1-4673-4450-0/12/$31.00 2012 IEEE

Sulman Basit3
3

Department of Biochemistry
Quaid-i-Azam University
Islamabad, Pakistan
3
basitphd@bs.qau.edu.pk

Our Contributions
We have estimated a time of splice site (IVS10-1 G T)
mutation occurs in a Pakistani family in CDH3 gene. We have
used two methods, first method is based on allele frequency
and is proposed by Kimura et al. [3] and second method is
DMLE+, proposed by Rannala et al. in 2001 [2] for age
prediction. We compared two approaches for predicting the
age of splice-site mutation with same parameters on the CDH3
gene dataset. We have found a mutation age predicted by one
method closely to the credible intervals as predicted by the
second method. We performed three independent simulation
runs of Markov chain Monte Carlo (MCMC) [2] and find the
consistent results about mutation age.
The organization of this paper is as follows: Section 2
describes the related work, Section 3 describes the proposed
statistical methods used for predicting a mutation age. In
Section 4, we describe about real dataset, Section 5 presents
experimental results, and Section 6 presents conclusions and
future work.
II.

RELATED WORK

A mutation age can be predicted based on genealogy and


based on allele frequency. Here we discuss both methods as
these methods are within the scope of our work.
A. Estimate of Mutation Age based on Genealogy
A primarily work on intra-allelic variability was first
proposed by Serre et al. in 1990 [4] on F508 mutation that
occurs in CFTR gene. In this approach, di-allelic loci are
considered from a sample taken from 240 French families. The
estimate of allele age can be obtained from a moment
estimator. To obtain a confidence interval of an age estimate
using a moment estimator, is uncertain [5]. The factors of
uncertainty are recombination rate and mutation rate.
In 1999, Griffiths et al. [6] proposed infinitely sites
mutation model that is represented by a unique gene tree. Here
the age of mutations and an age of most recent common
ancestor are estimated and the conditional distribution of ages

are found for a Melanesian population using a -globin locus


by considering a diploid dataset [7]. A coalescent model that
models the evolution of DNA sequences, gives the ancestral
relationship among the number of DNA sequences. In
coalescent process mutations are imposed on the tree as a
poisson processes of rate /2, where is a parameter of a
function of population size N and a mutation rate M per
sequence per generation. Probability distribution of trees can
be calculated by fundamental recursion for probability Monte
Carlo simulation approach.
In 2002, Yu et al. [8] proposed the Parsimony principle to
deduce a number of mutations. A population parameter used
here is , and is defined as a 4NM. A neutrality tests are
performed to test a hypothesis that whether a population is
evolved according to Wright Fisher model with constant
effective population size or not. Here, a gene program tree is
used for mutation age calculation.
In 1996, Goldgar et al. [9] implemented a multiple marker
likelihood method for estimating a mutation age. This method
allows a mutation to occur more than once in a genealogy tree
but it is ncessary that mutation-carrying haplotypes to be
known. Here, a different mutation rates can be set for different
types of markers but a growth rate is not taken ino account.
A modified Goldgar method [10] is presented where the
author has modified the likelihood to allow the haplotype
uncertainty that is a mutation occurs more than once in a
genealogy tree if the haplotypes carrying the mutation is
unsured.
B. Estimate a Mutation Age based on Allele Frequency
In 1975, Maruyama et al. proposed a diffusion model [11]
to find an approximate age of neutral mutant allele when a
allele frequency is fixed and it also gives an age before the
fixation of allele. In this model, expected value of mutant
allele is calculated along a sample path starts from initial
frequency pof allele at time 0 and reaches x at some time
later. To obtain confident results of an allele age, Monte Carlo
experiments are performed.
In 1975, Li [12] has introduced a new method to find an
age of deleterious mutation that causes a severe disease. He
has estimated a time when allele is reached at present
frequency by using diffusion methods. A genotype data is
considered in a population of Africa. Since, a deleterious
mutation occurs due to natural selection or random genetic
drift, so here a selection coefficient is considered as constant
against heterozygotes. The author has applied a result when
mutation rate is low and population size is constant. Therefore,
he concluded that a mean age is larger than variance of age.
In 2003, Griffiths [13] has estimated an expected age of
allele A having frequency x as a sample path average in
diffusion process that is
E(Age) = 01 G (x, y) u0 (y) / u0 (x) dy

(1)

where G (x,y) is a Green function of the diffusion, u0 (x) is the


probability of absorption at initial frequency of x.
A coalescent model is proposed by Griffiths et al. in 1998
[14], in which it is assumed that n,b denoted the age of mutant
gene that is there exist b copies of mutant gene in a sample
of n chromosomes. When a mutation occurred then there will
be k ancestors of the sample. Conditional distribution of n,k
is distributed as UTk + Sk+1.
In 1976, Thompson et al. [15] has estimated an age variant
based on replicates observed in a population. A study of
likelihood is taken on the basis of discrete branching process
model. Likelihood analysis provides a confidence interval
range in which a variant is originated. Stochastic evolution
process is required for age defined as a random variable. Here
it is assumed that each variant has a unique origin and each
mutant gene reproduces independently.
In 2000, Colombo [16] has estimated an age of N370S
mutation using (2),
g = log / log (1 )

(2)

where is a LD measure and is a recombination fraction. An


age estimated by (2) is an under estimate for growing
population and the genetic clock tics is more slowly than
expected. Therefore, age estimation for growing population is
set to the genetic clock according to Luria-Delbruck correction
as follows:
gc = gc + g0

(3)

where g is a number of generations estimated from (2) and


g0= - (1/d) ln ( fd), where fd = ed / ed 1 in a growing
population with growth rate d.
III.

PROPOSED STATISTICAL METHODS USED FOR MUTATION


AGE ESTIMATION

We have estimated an age of splice-site (IVS10-1 G T)


mutation using DMLE+ [2] and a method [3].
Linkage disequilibrium can be helpful in finding an age of
mutation and a location of mutation. A Markov chain Monte
Carlo method is used to estimate a mutation age and location
jointly called joint based estimation. In this work, an extension
of Bayesian linkage disequilibrium mapping is described.
MCMC method is used to generate joint posterior density of
parameter based on Metropolis-Hastings algorithm. This
method is implemented in program DMLE+ [2]. Bayesian
linkage disequilibrium gene mapping is an intra-allelic
coalescent model that uses a multiple genetic markers to find a
linkage among marker alleles. MCMC methods are used to
estimate the likelihood function, point out the haplotype on
which the mutation arises, position of disease locus and
ancestral haplotypes, and estimates the time of the origin of
mutation.

Figure 1. Pedigree of the family with hypotrichosis and juvenile macular dystrophy. Filled symbols, affected subjects; open symbols, unaffected subjects. The
disease-associated haplotypes are shown beneath each symbol. [1]

A. Mutation Analysis and Input Paremeters for DMLE+


Mutation analysis is performed by using genotype of
family members of both affected and control indiviudals of the
Pakistani family with microsatellite markers closely linked to
the CDH3 gene. We have given different parameters as
collected over the CDH3 gene dataset to inputs to the DMLE+
(release 2.3) progarm. These parameters are

Genotype data as shown in Fig. 1.

We have set a variable Mendilian Inheritance as


recessive.

Genetic distances that we have used between markers


are shown in Fig. 1.

Since, a splice-site mutation is discovered by Jelani et


al. in 2008 [1], So we considered a growth rate of the
population of Pakistan according to 2008. Therefore,

according to [17], a growth rate of Pakistani


population was 0.02.

Proportion of disease chromosomes in a sample is


0.001.

The remaining parameters are set as default parameters of


DMLE+.
Second method [3] that we have used for estimating an
age of splice site mutation is based on allele frequency.
Change in the frequency of allele in population occurs due to
some evolutionary processes like natural selection, genetic
drift, mutation and geneflow. In 1973, Kimura et al. [3]
proposed an expected age of mutant allele that has frequency
x in a constant population size, that is
- 2 x / 1 - x log x

(4)

where x is the frequency of a mutant allele. This formula


gives an age in scaled time units say z.
According to molecular population theory [8], if we
assume a constant population size of N=10000, then z * 10000
= w generations. To find an age in years we have assumed a
length of human generation that is 20 years [8]. Therefore, the
age of mutation in years, is w * 20 = k years.
B. Method: Mutation Age Calculation
1) Input: N, IR, L
a) N: Total number of individuals in a population
b) IR: Incidence Rate
c) P: Mutant Allele Frequency

TABLE I.

2) Output: Mutation age in years


i) P IR
ii) E ( t1 ) ( - 2 * P / 1 P ) * ln ( P )
iii) AGEINYEARS L * E ( t1 )
iv) return AGEINYEARS
In 2000, Slatkin et al. [5] has used this fomula for age
estimation and formulated a maximum likelihood estimation
(MLE) for the expected time of mutation. The MLE is
(5)

DATASET

The statistical methods discussed in Section 3 for age


estimation, are applied over the CDH3 gene dataset of a
consanguineous Pakistani family. This dataset is provided by
the Department of Biochemistry, Quaid-i-Azam University,
Islamabad, Pakistan. The mutation analysis was carried out
over the genotype data by examining the 13 members of a
family, including 6 affected and 7 unaffected members using
automated DNA sequencer. Among them, 6 affected members
carry splice site (IVS10-1 G T) mutation on chromosome
16q22.1 and are affected with HJMD disease as shown in Fig.
1. Total numbers of chromosomes are 13 including case and
controls and are used as inputs for DMLE+ program.

95% credible
interval (DMLE)

1
2
3

232
238
226

[144, 364]
[152, 385]
[156, 374]

VI.

REFERENCES
[1]

[2]
[3]

EXPERIMENTAL RESULTS

According to [17], in 2008, a growth rate r of Pakistani


population was 0.02. So by using r = 0.02 we have set a 95%
credible interval in units of generations. We have performed
three independent simulation runs using MCMC [2] method
with same parameters in order to obtain the consistent results
for the age of IVS10-1 G T mutation. Table I shows the
estimated mutation ages using DMLE+, which are found very
close to each other and within the range of credible intervals.

CONCLUSIONS

In this paper, we have considered the problem of mutation


age estimation. We have analyzed a CDH3 gene dataset for a
family of Pakistani population whose some members have
splice-site (IVS10-1 G T) mutation that causes HJMD
disease. We have applied two approaches, method [2] and
method [3] over CDH3 gene dataset and successfully
estimating a mutation age with 95% credibility. We have
observed that the mutation age vary with population growth
rate and the allele frequency. The estimated time of mutation
predicted by using DMLE+ method is more reliable as it is the
best representation of gene genealogies and their variability as
compared to method [3]. A mutation age estimated by method
[3] is found nearly to the credible intervals estimated using
DMLE+ but method [3] is a good choice, if a mutation occurs
in certain gene due to some evolutionary process and when
parameteric information is not readily available. In future, we
will be applying these methods on more datasets.

[4]

V.

Estimated mutation
age (generations)

According to (4), the expected time of mutation is 0.0138 in


scaled time units. According to molecular population theory
[8], if we assume a population size N=10000 which is a
minimum estimate of population size of modern humans
during the period before recent growth and assume that this
poluation size is constant then the mutation age is 0.0138 x
10000 = 138 generations.

3) ProcedureMutationAge (N, IR, L, P, AGEINYEARS)

IV.

Simulation Run
(MCMC)

Equation (5) gives us a maximum likelihood estimation for


mutation age that is 0.098. We have found that the expected
time E(t1) = 0.0138 of the mutation is different from the MLE.

d) L: Generation Length

ln (1 p) 2 / n

ESTIMATES OF MUTATION AGE FOR IVS10-1 G T

[5]
[6]
[7]

M. Jelani, M. S. Chishti and W. Ahmad, A novel splice-site mutation in


the CDH3 gene in hypotrichosis with juvenile macular dystrophy,
Journal of Clinical and Experimental Dermatology, vol. 34, pp. 68-73,
2008.
B. Rannala and J.P. Reeve, DMLE+: Bayesian linkage disequilibrium
gene mapping, Bioinformatics, vol. 18, pp. 894-895, 2002.
M. Kimura and T. Ohta, The age of a neutral mutant persisting in a
finite population, Genetics, vol. 75, pp. 199212, 1973.
J.L. Serre, B. Simon-Bouy, E. Mornet, B. Jaume-Roig, A.
Balassopoulou, et al., Studies of RFLP closely linked to the cystic
fibrosis locus throughout Europe lead to new considerations in
population genetics, Hum. Genet., vol. 84, pp:449-454, 1990.
M. Slatkin and B. Rannala, Estimating Allele Age, Annual Review of
Genomics and Hum. Genet., vol. 01, pp: 225-249, 2000.
R. C. Griffiths and S. Tavare, The Ages of Mutations in Gene Trees,
The Annals of Applied Probability, vol. 9, pp: 567-590, 1999.
S. M. Fullerton, R. M. Harding, A. J. Boyce and J.B. Clegg, Molecular
and population genetic analysis of allelic sequence diversity at the
human -globin locus, Proc. Nat. Acad. Sc. U.S.A, vol. 91, pp: 18051809, 1994.

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]
[16]

[17]

N. Yu, Y. Fu and W. Li, DNA Polymorphism in a World Wide Sample


of Human X Chromosomes, Mol. Biol. Evol. vol. 19, pp: 2131-2141,
2002.
S. L. Neuhausen, S. Mazoyer, L. Friedman, M. Stratton, K. Offit, A.
Caligo, G. Tomlinson, L. Cannon-Albright, T. Bishop, D. Kelsell, E.
Solomon, B. Weber, F. Couch, J. Struewing, P. Tonin, F. Durocher, S.
Narod, M. H. Kolnick, G. Lenoir, O. Serova, B. Ponder, D. StoppaLyonnet, D. Easton, M. C. King, D. E. Goldgar, Haplotype and
Phenotype Analysis of Six Recurrent BRCA1 mutation in 61 families,
Results of an International study. Am J Hum. Genet., vol. 58, pp: 271280, 1996.
C. MT. Greenwood, S. Sun, J. Veenstra, N. Hamel, B. Niell, S. Gruber
and W. D. Foulkes, How old is this mutation?- a study of three
Ashkenazi Jewish founder mutations, BMC Genetics, 11:39, 2010.
T. Maruyama and M. Kimura,Moments for sum of an Arbitrary
Function of Gene Frequency along a Stochastic Path of Gene Frequency
Change," Proc. Nat. Acad. Sci. U.S.A, vol. 72, pp. 1602-1604, 1975.
W. H. Li, The first arrival time and mean age of a deleterious mutant
gene in a finite population, Am. J. Hum. Genet., vol. 27, pp. 276-287,
1975.
R. C. Griffiths, "The frequency spectrum of a mutation and its age, in a
general diffusion model, Theoretical Population Biology, UK, vol. 64,
pp: 241-251, 2003.
R.C. Griffiths and S. Tavare, "The age of mutation in a general
coalescent tree,"Commun. Statist.- Stochastic Models, vol. 14, pp: 273295,1998.
EA. Thompson, Estimation of age and rate of increase of rare variants,
Am. J. Hum. Genet. 28:44252, 1976.
R. Colombo, "Age Estimate of the N370S Mutation Causing Gaucher
Disease in Ashkenazi Jews and European Populations: A Reappraisal of
Haplotype Data," Am. J. Hum. Genet. Vol. 66, pp: 692997, 2000.
THE WORLD FACTBOOK, Source: https://www.cia.gov/library/
publications/the-world-factbook/fields/2002.html.

Вам также может понравиться