Chapter 12 - M Ultifactorial Inheritance and Complex Diseases

CHAPTER
12
Multifactorial Inheritance
and Complex Diseases
Christine W Duarte, Laura K Vaughan, T Mark Beasley,
and Hemant K Tiwari
Department of Biostatistics, Section on Statistical Genetics, University of Alabama at Birmingham,
Birmingham, AL, USA
This article is a revision of the previous edition article by Hemant K Tiwari, T Mark Beasley, Varghese George and David B Allison,
volume 1, pp 299306, 2007, Elsevier Ltd.
12.1 INTRODUCTION
12.2 DEFINITIONS AND TERMINOLOGY
If a disease or condition is caused by a single locus of

large effect, it is called a single-gene or monogenic disease, disorder, or more generically, condition. There are
over 10,000 such examples, which include cystic fibrosis,
Huntington disease, Duchenne muscular dystrophy, and
Marfan syndrome. It is possible that a single-gene disease has locus heterogeneity if that disease is caused by
single mutations in different genes, but this is more properly considered a special case of an oligogenic disorder.
For example, osteogenesis imperfecta is caused by single
mutation in genes on either chromosome 7 or chromosome 17. Oligogenic disorders are explained by a few
loci with large effects (for examples, see Reference (1)).
In contrast to oligogenic traits, polygenic inheritance is
due to many loci with small effects at each locus. Thus,
the term polygenic is generally used to describe multiple
factors that are exclusively genetic. Any of these genetic
effects, with or without the combination of an environmental effect, can give rise to a multifactorial disorder.
Multifactorial diseases are caused by the simultaneous
action of multiple genetic and/or environmental factors.
In contrast to dichotomous traits (i.e. affected versus
unaffected), quantitative traits are measured on a continuous scale, most of which are thought to be multifactorial (e.g. blood pressure and body mass index). Some
quantitative traits may be due to major gene effects with
a multifactorial background. Multifactorial inheritance
is responsible for the majority of modern deleterious
health conditions such as heart disease and diabetes.
Atopic syndrome, diabetes, cancer, spina bifida/anencephaly, pyloric stenosis, cleft palate, congenital hip dysplasia, club foot, and many other disorders and complex
phenotypes result from multifactorial inheritance.
The polygenic model has its origins from Fishers seminal work (2), which showed that many small, equal and
additive loci would result in a Gaussian (or normal)
distribution for a phenotype. Similarly, the combined
additive effects of many genetic and environmental factors will also produce an approximately Gaussian phenotypic distribution. To illustrate, suppose (navely)
that a quantitative trait such as percent body fat is determined by a single gene with two codominant1 alleles, A
and a, which have equal frequency (p=0.50). Assume
individuals with an A allele tend to have a higher value
of the trait, and individuals with an a allele tend to have
a lower value of the trait. If A has an additive effect, then
there are three distinct phenotypic groups, namely, high
(2), intermediate (1), and low (0). If the allele frequencies of A and a are both 0.50, then 25% of individuals
would be expected to be aa and of low-percent fat, 50%
would be expected to be Aa and of moderate-percent
fat, and 25% would be expected to be AA and of highpercent fat. Figure 12-1 gives the distribution of the trait
in a population.
Now, suppose that the trait is determined by two loci.
The second locus also has two codominant alleles, B for
high and b for low expression of the trait, with B having
an allele frequency of 0.50 and the same effect magnitude as the A allele. There are now nine possible genotypes (see Table 12-1).
1An
allele a is said to be codominant with respect to the wild-type

allele A if the A/a heterozygote fully expresses both the phenotypes associated with the a/a and A/A homozygotes.from http://
www.informatics.jax.org/javawi2/servlet/WIFetch?page=glossary
Term&key=codominant.
2013, Elsevier Ltd. All rights reserved.
CHAPTER 12 Multifactorial Inheritance and Complex Diseases

TABLE 12-2 Genotypic Values of Two Loci
Genotypes
BB
Bb
bb
AA
Aa
aa
4
3
2
3
2
1
2
1
0
FIGURE 12-1 Expected phenotype distribution for a trait with a

single causal locus with an allele frequency of 50% and in Hardy
Weinberg equilibrium.
TA B L E 1 2 - 1 Frequency Distribution of
Genotypic Values for Two Loci
with No Linkage Disequilibrium
BB
Bb
Bb
AA
Aa
aa
0.0625
0.1250
0.0625
0.1250
0.2500
0.1250
0.1250
0.1250
0.0625
An individual can possess 0, 1, 2, 3, or 4 hightrait alleles. Assuming that the combined effects of the
two loci are also additive,2 there are five distinct phenotypes with respect to the number of high-trait alleles (see
Table12-2).
The trait distribution with respect to genotypic value
distribution is shown in Figure 12-2. As can be seen in
Figure 12-2, even with two loci, the distribution of the
phenotype starts to look Gaussian. An example of a
three-locus system with equal allele frequencies, no linkage disequilibrium,3 and equal additive effects, is shown
in Figure 12-3. It can be shown that six diallelic loci
are enough to produce population frequencies virtually
indistinguishable from a normal curve.
Many traits (or diseases) are treated as dichotomous
variables because they appear to be either present or absent
(e.g. cancer). By definition, dichotomous variables do not
approximate a Gaussian distribution. However, these
diseases may still be polygenic or multifactorial because
2That
is, not epistatic, where epistatic refers to an interaction (in

the statistical, not necessarily biochemical sense) between two different loci, such that the effect of genotype at one locus depends on
the genotype at another locus.
3Linkage disequilibrium is defined as the nonrandom association
between alleles at linked (or adjacent) loci (133). Two loci are said
to be linked if they are close sufficiently close on the same chromosome such that they do not segregate independently.
FIGURE 12-2 Expected phenotype distribution for a trait with
two independently segregating causal loci of equal effect and allele

frequency.
FIGURE 12-3 Expected phenotype distribution for a trait with
three independently segregating causal loci of equal effect and

allele frequency.
they do not follow the patterns expected of Mendelian

(single-gene) diseases. A common explanation is that an
underlying liability distribution exists for multifactorial
diseases (3). Individuals on the low end of the distribution
have little chance of developing the disease because they
possess few of the alleles or environmental factors that
jointly cause the disease. By contrast, individuals on the
high end of the liability distribution have a greater chance

of developing the disease because they possess many of
the alleles and/or environmental factors that jointly cause
the disease. The liability distribution is assumed to be
continuous (representing the sum of a large number of
independent genetic and environmental factors) and normally distributed within the population. It is also a commonplace to assume that all correlations between relatives
are due to shared genes but not shared environment. For
multifactorial diseases that are either present or absent,
there is a hypothesized threshold of liability that must be
crossed before the disease is manifest (3).
For example, consider the development of the cleft
palate. Early in embryonic development, the palatal
arches are in a vertical position. Through embryonic
and fetal development, the head grows larger, making
the arches farther apart, and the tongue increases in size,
making it more difficult to move. In addition, the arches
themselves are growing and turning horizontally. There
is a critical stage in development by which the two arches
must meet and fuse. Head growth, tongue growth, and
palatal arch growth are all subject to many genetic and
environmental factors. If the two arches start to grow in
time, grow at the proper rate, and begin to move soon
enough to the horizontal, they will meet and fuse in spite
of head size and tongue growth. The result is no cleft
palate. They may fuse well ahead of the critical developmental stage or just barely make it in time; it is impossible to know. However, if they dont meet at the critical
stage, a cleft palate results. If they are close together at
the critical stage, a small cleft will result, perhaps only a
bifurcated uvula. If they are far apart, a more severe cleft
will result. That critical difference in liability is called the
threshold. Beyond the threshold, disease results. Below
the threshold, normal development is observed. Thus
the underlying liability is distributed as the normal curve
shown in Figure 12-4.
Some diseases may have more than one threshold, and
commonly two liability thresholds are present as defined
by factors such as gender, race, age of onset, and so on,
causing different levels of severity (4). Examples include
pyloric stenosis (sex dimorphism for liability) (5) and
orofacial cleft syndrome/cleft lip and palate (two thresholds for fetal mortality and disease) (6,7). The latter
model proposes a lower threshold level of liability resulting in a cleft formation and a higher level causing a fetal
death (preferentially in males).
It should be emphasized that, similar to any other loci,
the individual loci underlying a polygenic or multifactorial trait are generally assumed to follow the Mendelian principles of random segregation and independent
assortment.4 The difference is that they act together
to influence the trait. Thus, the multifactorial model
assumes:
4Good
descriptions of these principles can be found in the following

web pages: http://biology.about.com/library/weekly/aa110603a.htm
and http://biology.about.com/library/weekly/aa100903a.htm.
FIGURE 12-4 Liability distribution for a multifactorial disease. To be

affected with the disease, an individual must exceed the threshold.
1. several, but not an unlimited number, of loci are

involved in the expression of the trait;
2. the loci act in concert in an additive manner, each
adding or subtracting a small amount from the phenotype; and
3. the environment interacts with the genotype to produce the final phenotype.
12.3 DETERMINING THE GENETIC

COMPONENT OF A TRAIT
Historically, the genetic study of any trait can be divided
into four broad categories: familial aggregation, segregation analysis, linkage analysis, and association studies.
This paradigm was useful in discovering genes for many
monogenic disorders.
12.3.1 Familial Aggregation

The first step of any genetic analysis is to establish a
genetic component to the disease. In addition, one must
establish the relative size of the genetic effect in comparison to other sources of variation, such as common
household effect and random environmental effect.
Familial aggregation can be established using familybased or twin/adoption studies. Family members share
genes and environment; therefore, familial aggregation
of the trait could be due to genetic and environment
together. In general, very few traits are influenced only
by genes or only by the environment. Detection and
estimation of familial aggregation is the first step in the
genetic analysis of any multifactorial trait. Twin and
adoption studies are traditionally used to determine the
genetic component of the trait (811). As monozygotic
TA B L E 1 2 - 3 Using Twin Concordance and

Discordance Rates to Test for a
Genetic Component of a Disease
Twins
MZ
DZ
21 =
Concordant
Pair
Discordant
Pair
Total Pairs
n11
n21
nC
n12
n22
nD
nMZ
nDZ
n
n(n11 n22 n12 n21 )2

nC nD nMZ nDZ
(MZ) twins share all their nuclear genes, any difference

between them regarding a particular trait should be due
solely to environmental effects. If the trait is completely
influenced by genes, then MZ twins should have essentially identical expression of the trait. This is not true
for dizygotic (DZ) twins because, on average, they share
only 50% of their genes.
Twin studies to determine the genetic component of
the threshold character are based on comparing concordance rates of MZ and DZ twins. If both members of a
twin pair have the same status of a dichotomous trait
(i.e. either both have the disease or both do not have
the disease), they are concordant. If they do not share
the trait status, they are discordant. The concordance
rate is the proportion of concordant twin pairs among
all those with the trait. Significantly higher concordance
rates in MZ twins compared with DZ twins is considered
an evidence for a significant genetic component of the
disease. The significance of the difference can be easily
tested by a 22 contingency table using a chi-square test
(see Table12-3).
However, concordance rates are not appropriate for
continuous traits, so correlation coefficients can be used
instead of concordance rates (12).
The method to determine the degree of genetic component of a continuous trait is based on a comparison of
the variance of the differences between MZ twins and
differences between DZ twins. As MZ twins share all
their genes, the variance of the trait between MZ twins
(VMZ) must be due to environmental variance (VE), so
in this case, we have VMZ=VE. However, the variance
of the trait between the DZ twins (VDZ) could be due
to both environment (VE) and shared genes (VG). So,
genetic variance is VG=VDZVMZ, and therefore, the
heritability, h2, is defined as
h2 =
VDZ VMZ
VMZ
Heritability ranges between 0 and 1, with 0 meaning

a solely environmentally determined trait and 1 meaning
a completely genetically determined trait.
Adoption studies provide a second familial aggregation strategy for estimating the influence of genes on
multifactorial traits. The strategy consists of comparing

disease rates among the adopted offspring of affected
parents with the rates among adopted offspring of unaffected parents. Certain biases can influence these studies,
namely, (1) parental environment could have long-lasting
effects on an adopted child; (2) adoption agencies attempt
to match the adoptive parents with natural parents in
terms of socioeconomic status; (3) children might be several years old when adopted, introducing the potential for
many environmental confounds. Moreover, these studies
are reasonably good at estimating additive genetic effects
that are not age specific, but poor at estimating nonadditive genetic effects or genetic effects that are expressed
differently across the age span.
For continuous traits, the familial aggregation is usually measured by heritability, which is the proportion
of variability of the trait explained by genetic variation.
Heritability can be defined as either using total genetic
effects (sum of additive, dominant, and epistatic effects)
or using only additive effects. The former quantity is
called heritability in the broad sense and is given by
h2 =
Var(G)
Var(T)
where Var(G) and Var(T) are genetic and total variance,

respectively. The latter quantity is called heritability in
the narrow sense and is given by
h2 =
Var(A)
Var(T)
where Var(A) is the additive genetic variance.

One cannot conclude the number of genes or which
genes are involved in the etiology of the trait from a
heritability estimate. Although the absence of familial
aggregation is generally thought to rule out a genetic
contribution the trait, there are some unlikely yet plausible scenarios where this is not so, including but not limited to phenotypic competition within families (13) that
counters genetic effects, and an extreme form of epistasis referred to by some as emergenesis (14). It is also
important to emphasize that heritability is a populationspecific estimate, and thus can vary from population to
population.
There are many other methods to detect and estimate
familial aggregation using family data. For example, the
recurrence risk is often used to determine the strength of
familial aggregation for a discrete trait. The recurrence
risk is the probability that a relative of an affected individual is also affected. The most commonly used measure is the sibling recurrence risk, i.e. the probability that
a sibling of an affected individual is also affected. The
ratio of the sibling recurrence risk and the overall disease
prevalence is called a sibling relative risk. It is one of the
measures of the magnitude of the genetic contribution
to susceptibility for a dichotomous trait (affected versus unaffected). Examination of relative recurrence risk
values for various classes of relatives could suggest that

the trait is influenced by multiple loci (15). For a single
ascertainment scheme, the sibling recurrence risk can be
calculated from sibling data as follows (16):
(a 1)ns(a)
Ks = s = 1 a = 1
s=1
a = 1 (s 1)ns(a)
where a=number of affected sibs in a sibship, s=number

of siblings in the sibship, and ns(a)=number of sibships of
size s with a affected sibs.
Note that the familial aggregation methods given
above use only trait information from the sample. Owing
to the availability of genome-wide single nucleotide polymorphism (SNP) data, it is now feasible to calculate the
heritability using genome-wide SNP markers. One such
method was proposed by Visscher etal. (17), who used
genome-wide identity-by-descent (IBD) sharing probability between full sibs using genome-wide SNPs data.
12.3.2 Segregation Analysis

Once a genetic basis of the trait has been established,
traditionally, the next step has been to determine the
genetic models that explain the segregation of a phenotype (continuous, dichotomous, or ordinal) in a given
familial data set via segregation analysis. Segregation
analysis requires phenotypic data on related individuals and does not require any molecular data. Segregation analysis is the statistical methodology to determine
whether a model with one or more major genes and/or
polygenes (i.e. a set of genes each with small quantitative effect that together produce a phenotype) is consistent with the observed pattern of phenotypic inheritance,
and to estimate the parameters of the best-fitting genetic
model. It entails determining the mode of inheritance
(additive, recessive, or dominant), estimating disease
allele frequency, and estimating penetrances (probability
of being affected given genotype). At one time, segregation analysis was one of the most important tools for
genetic analysis of familial data. In the late 1980s, large
numbers of DNA markers became available, which rendered segregation analysis less popular.
If the trait is monogenic and thus owing to single
major gene effect, segregation analysis has proven to be
a very effective tool in determining the parameters for
mode of inheritance. Subsequently, these parameters
have been used in model-based linkage analysis (see
below for more detail) to find the location of putative
disease-causing genes. This paradigm has been used successfully for the simple Mendelian traits, in which only
one gene is segregating. For multifactorial traits, which
may be due to the effects of many genes and environmental effects, estimation of the genetic model may be
virtually impossible using segregation analysis.
In order to determine the parameters of the genetic
model using segregation analysis, the likelihood of a
particular mode of inheritance can be formulated using
three types of probability functions. First, there is a
probability distribution for segregation of genotypes

among the founders (individuals whose parents are not
included in the observed pedigree data), where genotypes
of the founders are independently drawn from the population based on the prevalence of the disease and mode
of inheritance. Second, there is a probability distribution
of the segregating genotypes of the non-founders (individuals with both parents in the pedigree) conditional
on their parental genotypes. Third, there are penetrance
functions (probability of being affected given a particular genotype). To test whether there is a segregation of
a single gene, the likelihood under the assumed genetic
model is compared with the likelihood under the null
model of segregation with no genetic effect. The more
complicated or general model could be included for testing a particular mode of inheritance including polygenic
or multifactorial components in modeling the disease.
However, the number of possible genetic models with a
given mode of inheritance may be too large to make any
meaningful inference about the disease model.
12.3.3 Linkage Analysis

Genetic linkage analysis is based on the observation that
any two loci, which are in close proximity to each other
on the same chromosome, will tend to cosegregate among
related individuals more often than two random loci in
the genome. Thus, the affected individuals sharing a genotype at a putative disease locus would be more likely to
share a genotype at linked marker loci. In the absence of
linkage, the recombination fraction (i.e. =the proportion of gametes in which two genes on the same parental
chromosome are separated; for more details, see http://
en.wikipedia.org/wiki/Genetic_linkage) is ; however, if
there is a linkage, the recombination fraction is <. If
we cover the entire human genome using evenly spaced
markers across the chromosomes, it will be possible to
find marker loci associated with a given trait of interest.
The methods of linkage analysis can be divided into
two broad classes: model-based (parametric) and modelfree (sometimes referred to as nonparametric) linkage
analysis.
12.3.3.1 Model-Based Analysis. Model-based methods assume a specific genetic model underlying the trait,
and the statistical evidence in favor of linkage with a
marker locus is summarized by the maximum value of
the lod score (18). The lod score is the logarithm of the
likelihood ratio of observing a particular set of family
data under a specific alternative hypothesis of linkage
relative to the null hypothesis of no linkage between disease and marker loci (H0: =). For details of how the
likelihood is formulated, see Elston and Stewart (19),
Elston and Rao (20), and Lander and Green (21).
This approach has been remarkably successful in
identifying genes responsible for Mendelian disorders.
In order to calculate such likelihood for families, we
must specify a probability model. Several assumptions
are usually made in calculating lod scores: the mode of

inheritance underlying the marker and the trait is known,
the parameters such as penetrances and allele frequencies
at both marker and trait loci are known without error,
and all founders are unrelated to each other. Misspecification of any of these assumptions can affect the validity
or power of the analysis and can result in an inconsistent
estimate of the recombination fraction. Thus, the models
used in model-based analysis must approximate the complexity of the disease being investigated. It is noteworthy
that incorrect specification of a legitimate model-based
linkage test may reduce power but generally does not
lead to an inflated Type I error rate (false-positive rate).
12.3.3.2 Model-Free Linkage Analysis. The genetic
mechanism underlying a complex disease is often
unknown, and it is impossible to specify the correct
genetic parameters such as mode of inheritance, disease
allele frequency, and penetrance in complex diseases.
Under these circumstances, model-free linkage analysis,
which makes no assumption about the mode of inheritance of the trait, is usually preferred. If a disease susceptibility locus and a marker locus are linked, and, by
definition, cosegregating in a family, pairs of relatives
who are concordant for the disease (i.e. both affected
or neither affected) should share more alleles identicalby-descent (IBD5) than will an average pair of relatives
with the same degree of kinship. Similarly, discordant
(affectedunaffected) relative pairs should share fewer
alleles IBD than will an average pair of relatives with
the same degree of kinship at the disease locus or marker
linked to the disease locus. Model-free methods were first
derived for sib pairs (22,23), but were extended to other
relative pairs (24,25). The HasemanElston method
consists of regressing the squared phenotypic difference
among siblings within sib pairs on the estimated proportion of alleles the sibs share IBD at a marker locus. A
negative slope suggests linkage because it indicates that
greater similarity at a trait locus tends to occur with
greater similarity at a marker locus. There have been a
number of extensions to the HasemanElston method to
increase its power (2635). In 2003, Human Heredity
published special topic issue titled Recent Advances in
the Analysis of Genetic Traits celebrating the thirtieth
anniversary of the seminal paper by Haseman and Elston
for quantitative trait linkage analysis (Human Heredity
2003, Vol. 55, No. 23). Several software programs are
available to perform linkage analyses, including S.A.G.E.
(36), GENEHUNTER (37), LINKAGE (3840), Mx
(http://www.vcu.edu/mx/) and MERLIN (41).
The model-free methods described above mostly utilize
relative pairs such as sib pairs. However, large extended
families could provide more linkage information than
5IBD
alleles in an individual or in two people are defined as alleles

that are identical because they have both been inherited from the
same common ancestor, as opposed to identity-by-state.from
http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=hmg.
these relative pairs. Methods based on the variance components framework have become a popular choice for
linkage analysis because of ease of modeling covariates
and genegene or geneenvironment interactions, and
can utilize large extended families in the model-free environment (4246). There are several programs available
to perform linkage analyses based on variance components methodology such as ACT (43), SOLAR (42), and
Mx (http://www.vcu.edu/mx/). Mx uses a structural
equation modeling approach, which is equivalent to the
other types of variance components approaches under
most circumstances. There are many more statistical
genetics packages freely available for public use, and a
complete list along with download information for each
package can be found at http://linkage.rockefeller.edu/
soft/ and http://www.soph.uab.edu/ssg/linkage/lddac.
12.3.4 Transmission Disequilibrium

Test and Association Analysis Using
Familial Data
In classical genetic studies, the identification of a chromosomal region with linkage analysis is the first step in the
gene mapping process. As linkage analysis provides information of a genomic region, a typical quantitative trait
locus (QTL) may cover several millions of base pairs and
may contain hundreds of genes. The initial detection of
the QTL is followed by addition of more markers within
the QTL to narrow down the region as much as possible. Once the resolution limit of the linkage approach is
reached, the most commonly employed follow-up is to
fine map using association analyses with SNPs.
The rationale for association analyses is to confirm
the involvement of a putative allele involved in a trait of
interest. The rationale of fine mapping is that the greater
number of SNPs and the greater sensitivity of the association tests provide more detailed information of the target region. The SNPs located close to a disease locus may
cosegregate because of linkage disequilibrium, i.e. allelic
association due to linkage. The allelic association forms
the theoretical basis for association mapping. Allelic
association that is not due to linkage disequilibrium is of
no interest in mapping disease genes. The simplest way
to test for association is to perform a casecontrol study,
where the cases are the individuals with the disease
and the controls are without the disease. Association is
then tested by ascertaining whether a particular marker
allele is more frequent among the cases than the controls. A significant result will be observed if the marker
is in linkage disequilibrium with the disease locus, or
from a variety of confounding reasons such as population stratification. Therefore, casecontrol association
studies without controlling for stratification are prone
to false-positive results with no biological significance,
and for this reason, association studies were not popular
until the mid-1990s when methods to account for this
stratification were established using familial data (47).

To control for population stratification, Spielman etal.
(47) proposed the transmission disequilibrium test
(TDT), which is a family-based association test (FBAT)
in the presence of linkage that controls for population
stratification by comparing the allele frequencies among
alleles transmitted to an affected offspring to those
that are not transmitted to an affected offspring from
informative parental matings (i.e. matings with at least
one heterozygous parent). This study design requires
the collection of family trios that include two parents
and an affected offspring. More than 225 extensions
and variations of the original TDT have been proposed
(see the exhaustive review of TDT procedures in Tiwari
et al. (48)). There are a number of software programs
available for TDT and/or association analyses using
family data such as FBAT (http://www.biostat.harvard.
edu/~fbat/fbat.htm), ASSOC (http://darwin.cwru.edu/
sage/), and GASSOC (http://mayoresearch.mayo.edu/
schaid_lab/software.cfm). A complete list of association
programs can be found at http://linkage.rockefeller.edu/
and http://www.soph.uab.edu/ssg.asp?id=1087.
Once the results from the association analyses are
deemed adequate, the next step is to screen the candidate
genes for DNA sequence variation by direct sequencing.
The relevance of the detected mutations is confirmed
with additional association studies in the original and
other populations, as well as functional assays in vitro
(expression studies in different cell lines) and in vivo
(transgenic and knock-out animal models) (49).
12.4 THE INTERNATIONAL HapMap

PROJECT
In the context presented above, studies progress from estimates of heritability to segregation analysis to linkage and
then finally familial association analysis to determine candidate genes for a trait of interest. However, this paradigm
has changed recently. With the advent of high-dimensional
genotyping technologies using microarrays, the approach
for discovering new genetic variants for a disease or trait
have changed drastically. In 1996, Lander proposed the
common-disease common-variant (CDCV) hypothesis
(50). The HapMap project was initiated to create dense
set of genetic markers to test the CDCV hypothesis. The
CDCV is based on the idea that the genetic component
of common diseases is attributable in part to common
allelic variants (i.e. alleles with frequency at least 5%).
The draft of the complete human genome sequence was
completed in 2001 and had strong effect in advances in
the genome sequencing technology (51). The International
HapMap Project (www.hapmap.org) was an international
partnership that was formed in 2002 to help researchers
find genes associated with human disease by providing
a public database of common genome-wide human variation across populations (5254). The first stage of the
HapMap project focused on four diverse populations: 30
trios (two parents and adult child) from Yoruba in Ibadan
(90 individuals), Nigeria, 30 trios from the Centre dEtude

du Polymorphisme Humain (CEPH) collection of Utah
residents of Northern and Western European ancestry,
45 unrelated individuals from the Han Chinese in Beijing,
and 45 unrelated individuals from Japanese in Tokyo.
This project genotyped over 1 million SNPs in Phase
I and additional 2.1 million in Phase II in the HapMap
samples (53,54). It helped initiate advances in SNP array
technologies to make genome-wide association studies
(GWAS) feasible and affordable. Affymetrix and Illumina
SNP arrays became available to researchers, which initially
surveyed approximately 100,000 SNPs, and which now
survey 2.5 million SNPs. During the most recent phase,
HapMap 3, 1184 individuals representing 11 global populations were genotyped for approximately 1.6 million
common SNPs (55; http://www.sanger.ac.uk/resources/
downloads/human/hapmap3.html; http://www.broadinstitute.org/~debakker/p3.html; http://hapmap.ncbi.nlm.
nih.gov/).
As a complement, the 1000 Genomes Project, which
began in 2008, was initiated to provide a catalog of lowfrequency SNPs and structural and sequence variants
in the human genome (56; http://www.1000genomes.
org/).
12.5 GENOME-WIDE ASSOCIATION

STUDIES
GWAS is an approach that involves scanning thousands
to a few million SNPs across the whole genome on many
individuals to find association with a disease or trait. As
mentioned earlier, GWAS became a popular choice of
genetic studies to detect putative loci associated with a
disease or trait because of availability of high-throughput
SNP arrays, decreased cost of genotyping, and methods
to correct for population stratification (i.e. systematic differences in allele frequencies between subpopulations in a
given population possibly because of different ancestry).
Before the HapMap era, investigators were reluctant to
conduct association studies using population data because
of concerns about population stratification. For example,
in case-control studies, we usually test association of a particular SNP by comparing allele frequency between cases
and controls. Allele frequencies are known to vary within
and between populations depending on genetic ancestry
(53,57). Genetic ancestry becomes a confounding variable
leading to spurious associations if allele frequencies are
different within or between race/ethnic groups. Methods
for correcting population substructure are described later.
12.5.1 Study Designs

Any type of data set such as pedigree data, case-control
data, or population data are all appropriate choices for
GWAS, but one has to adjust for familial correlations in
pedigree data and population stratification in population
or case-control data sets to control for the confounding
due to relatedness or population substructure. Casecontrol and population data have been commonly used
for GWAS because of availability and convenience of
ascertainment. There are some issues associated with the
case-control design. If the disease is heterogenous, extra
attention should be paid to minimize heterogeneity in
case selection, e.g. selecting the most extreme cases or
selecting individuals from a familial disease cohort. There
has been a controversy in how to select optimal controls.
Usually, controls from the same population and residing
in same geographic area are preferred, but these can be
difficult to ascertain. The Wellcome Trust Case Control
Consortium used 3000 UK controls and 2000 cases from
each of seven different diseases and showed that using
common controls was effective, had minimal impact on
genotypic distributions, and did not lead to excess false
positives (58,59). Misclassification error in control selection could affect the power of the association analysis.
Specifically, this is true for late-onset diseases because
controls have not yet reached the age to develop the disease; this issue can be resolved by increasing the sample
size (58). Population stratification and cryptic relatedness (i.e. relatedness among the individuals in the study
that is not known to the investigator) can also increase
the false-positive findings as previously discussed. The
family-based association studies are robust to population
stratification, but it is difficult to ascertain all pedigree
members, which leads to missing data within families
and loss of power compared to case-control designs (60).
There are other issues with study design selection; an
excellent review is provided by McCarthy etal. (58).
12.5.2 Quality Control

The first step of GWAS analysis is the QC of the genotypic and phenotypic data. There are a number of procedures needed to ensure the quality of genotype data both
at the genotyping laboratory and after calling genotypes
using statistical approaches. Here, we assume that the
genotyping laboratory has used best practices to remove
technical variation, and we present only the statistical
methods that are used after completion of the genotyping. The QC and association analysis of GWAS data can
be performed using the robust, freely available, and open
source software PLINK developed by Purcell etal. (61).
Two recent publications provide excellent reviews of the
QC protocol for GWAS data (62,63). Here, we provide
few important steps of the QC in GWAS in similar guidelines as in Laurie etal. (62) and Turner etal. (63). Note
that the current genotyping technology is very reliable,
but there are still some possibilities of errors when genotyping large number of SNPs.
12.5.3 Sex Inconsistency

It is possible that self-reported sex of the individual is
incorrect. Sex inconsistency can be checked by comparing
the reported sex of each individual with predicted sex by

using X-chromosome markers heterozygosity to determine the sex of the individual empirically.
12.5.4 Relatedness and Mendelian Errors

Another kind of error that can occur in genotyping is
due to sample mix-up, cryptic relatedness, duplications,
and pedigree errors such as self-reported relationships
that are not accurate. To detect sample relatedness, one
can calculate three IBD probabilities of sharing 0, 1,
and 2 alleles that are identical-by-descent for each pair
of individuals using software such as PLINK and a kinship coefficient matrix. Individuals sharing zero alleles at
every locus are unrelated, individuals sharing one allele
IBD at every locus are parentoffspring pairs, individuals sharing two alleles IBD at every locus are MZ twins
or a duplicated sample, and on average, sib pairs share
0, 1, and 2 alleles IBD with sharing probabilities 0.25,
0.5, and 0.25, respectively. The relationship errors can
be corrected by consulting with the self-reported relationships and/or using inferred genetic relationships.
Cryptic relatedness can inflate the variance of the test
statistic (e.g. if the test statistic is the difference in the
overall allele counts between case and control samples in
a trend test (64)). The presence of cryptic relatedness in
case-control studies increases the false positives in association analysis. Devlin and Roeder provided a method
to correct for the variance inflation (see References (64)
and (65) for details).
12.5.5 Batch Effects

For GWAS, samples are processed together for genotyping in a batch. The size and composition of the sample
batch depends on the type of the commercial array; for
example, an affymetrix array can genotype up to 96
samples, and an Illumina array can genotype up to 24
samples. To minimize batch effects, samples should be
randomly assigned to plates with different phenotypes,
sex, race, and ethnicity. The downstream association
study can be confounded by the batch effects. There are
several methods available to detect any batch effects. The
most commonly used method is to compare the average
minor allele frequencies (MAFs) and average genotyping
call rates across all SNPs for each plate. Most genotyping
laboratories perform batch effect detection and usually
regenotype the data if there is a batch effect or a plate is
discarded when there is a large amount of missing data.
12.5.6 Marker and Sample Genotyping

Efficiency or Call Rate
Marker genotyping efficiency is defined as the proportion
of samples with a genotype call for each marker. If large
numbers of samples is not called for a particular marker,
that is, an indication of a poor assay and the marker

should be removed from further analysis. A threshold for
removing markers varies from study to study depending
on the sample size of the study. Usually recommended
call rates are 9899%. If the quality of the DNA sample
is poor, it leads to low call rate of genotypes for the individual, i.e. the number of missing genotypes will be large
and the sample should be excluded from further analysis.
Before performing the association analysis, one should
filter out the samples and markers using some threshold
for marker and sample call rates.
12.5.7 Population Stratification

There are a number of methods proposed to correct for
population substructure. Three commonly used methods to correct for the underlying variation in allele frequencies that leads to confounding due to population
stratification are genomic control (4,6471), structured
association testing (7274), and principal components
(PCs) (75,76). The genomic control method controls for
confounding due to substructure by estimating an inflation factor (ratio of the variance of the test statistic to
the variance under the null hypothesis), then adjusting
the test statistics for all markers in GWAS downward
by the inflation factor. Usually, the inflation factor is
calculated using a few hundred loci. Structure association testing (72,77) (1) estimates ancestry proportions
of each individual from the founding population using
markers with different allele frequencies in the founder
population; and (2) uses theses proportions to cluster
individuals to create homogenous groups with similar
ancestry profile for the association analysis. Principal
components analysis (PCA) uses thousands of markers to detect population stratification and PCs then can
be used to correct for stratification by modeling PCs as
covariates in the model (75,76). PCs can be calculated
using a program Eigenstrat (75,76). There are two issues
with using PCA: (1) how many SNPs to use, and (2)
how many PCs should be included as covariates in the
association analysis.
12.5.8 Marker Allele Frequency and

HardyWeinberg Equilibrium Filter
The HardyWeinberg equilibrium (HWE) test compares the observed genotypic proportion at the marker
versus the expected proportion. Deviation from HWE
at a marker locus can be due to population stratification, inbreeding, selection, nonrandom mating, genotyping error, actual association to the disease or trait under
study, or a deletion or duplication polymorphism. However, HWE is typically used to detect genotyping errors.
SNPs that do not meet HWE at a certain threshold of
significance are usually excluded from further association analysis. It is also important to discard SNPs based
on MAF. Most GWAS studies are powered to detect a
disease association with common SNPs (MAF0.05).
The rare SNPs may lead to spurious results because of

the small number of homozygotes for the minor allele,
genotyping errors, or population stratification.
12.6 ASSOCIATION METHODS/

STATISTICAL ANALYSIS
12.6.1 Discovery Phase of the GWAS
Discovery phase consists of scanning the genome with
at least 1000 cases and 1000 controls with an appropriate statistical test. The choice of the statistical test for
association depends on the study design and the phenotype under consideration. In the case-control design, the
goal is to compare the allele or genotypic frequencies
between cases (affected) and controls (normal). This can
be tested with Pearsons chi-square test, Fishers exact
test, or the CochranArmitage test. Pearsons chi-square
tests the null hypothesis of no association between rows
and columns of the 23 contingency table consisting of
the counts of the three genotypes among cases and controls (78). Fishers exact test is similar to the Pearsons
test, but the deviation from the null hypothesis is calculated exactly from all possible permutations of the data,
and thus, does not assume the asymptotic property as
Pearsons test (79). The CochranArmitage test for trend
is a test of proportions of cases versus controls (8082)
and assumes an additive mode of inheritance that is a linear trend. There is a loss of power if the trend is not linear. Freidlin etal. (83) recommended using a maximum
of the test statistics obtained from additive, dominant, or
recessive effects models. Note that in the above statistical procedures, one cannot model covariates such as sex,
age, race, age of onset, PCs (PCs from admixture), and so
on. To accommodate any relevant covariates in the analysis, one can use logistic regression. Logistic regression
is more flexible in that it can model covariates, multiple
SNPs as main effects, SNP by SNP interactions, SNP by
environment interactions, and so on. If the phenotype
is continuous, analysis of variance and general linear
model approaches can be employed. One can also use a
linear regression framework if extremes of the distribution are used to define case and control status. Huang
and Lin have given an efficient association method
using extreme phenotypes (84). The analysis of familial
data requires correcting for the dependency of observations. The notable methods include linear mixed model
(8587), FBAT (see review by Laird and Lange (88)), or
ASSOC (a module of S.A.G.E. software suite (36)). After
scanning 12 million SNPs (i.e. 12 million statistical
tests) to determine significant associations, appropriate
multiple testing correction is required to control for false
positives and also in choosing SNPs for follow-up studies. The guideline for significant association for GWAS
is generally a p value of ~5108 (60,89), but, it is
common practice to use a higher p value threshold for
follow-up study or replication. Balding (90) provides a
10
comprehensive discussion of the advantages and disadvantages of these methods pertaining to GWAS.
12.6.2 Validation and Replication Phase

Some investigators have recommended reanalysis of the
original discovery phase GWAS data using a different
genotype platform for validation, which has been termed
technical validation (58). Technical validation allows
detection of technical errors in genotyping that might
give rise to spurious association signals or false positives,
but given limitations of the resources available to investigator, it may not be feasible. The replication phase or
follow-up study is one of the most challenging aspects
of the GWAS and is required to control for false positives. Replication in an independent data set with similar
genetic background and phenotype is warranted. Usually
several hundreds or a few thousands of SNPs are tested
in a replication set, depending on the threshold used for
significant association p value. The statistical methods
are the same as in the discovery phase, depending on the
study design and type of phenotype.
12.7 ANALYSIS OF RARE VARIANTS

USING NEW TECHNOLOGIES
Introduction of the HapMap project and large-scale
GWAS studies were driven by the common disease,
common variant (CVCD) hypothesis that was first
introduced in the 1990s (50,91). The CDCV is based on
the idea that the genetic component of common diseases
is attributable in part to allelic variants that are present
in more than 5% of the population. An extension of this
hypothesis is that the same variants will be responsible
for the disease across multiple populations (92). The
early success of GWAS (age-related macular degeneration) seemed to support the theory that a large proportion
of the genetic variants underlying complex disease could
be explained by the CDCV. It is now becoming apparent
that many common variants confer only a small portion
of risk individually and also explain a small portion of
the heritability of common complex diseases (93). While
GWAS have been successful in many ways, identifying
hundreds of variants for a large number of traits (http://
www.genome.gov/26525384), there still remains a large
proportion of heritability that has yet to be explained.
When the CDCV hypothesis was first introduced, it was
not without contention (94). One of the strongest counter
arguments was based on the hypothesis of common disease, rare variant, (CDRV) which is in essence the antithesis of the CDCV hypothesis (77,95). The rare variant
hypothesis proposes that common complex diseases are
due to the combined effect of multiple rare variants with
moderate to low individual risk. Unlike CVCD, it is generally thought that owing to population history, these rare
variants will be population specific (96). It is only recently,
with the availability of affordable large-scale sequencing
technology and advances in analytical methods (discussed

below), that scientists have gained the ability to address
the role of CDRV in human disease. In reality, it is likely
that the genetic basis of complex disease is somewhere
between the two extremes, with multiple genes interacting
together with a variety of common and rare variants and
other genetic and environmental factors (97).
As new high-throughput, massively parallel sequencing
technologies emerged in 2005, direct sequencing became
commonly used to directly interrogate whole genomic
sequences for association with disease without prior specification of SNPs currently available on commercial SNP
chips (98). Such technologies overcome some of the shortcomings of GWAS methods, such as ascertainment bias
in the set of currently available SNPs, and the ability to
assay rare or private variants. In addition, greater flexibility exists in the search for variants other than SNPs, such
as copy number variants, insertions or deletions or indels,
inversions, and so on. Whole exome sequencing, in which
only the sequence of exons are assayed, has been used
to discover causal mutations in a number of Mendelian
disorders such as Millers syndrome (99) and Hereditary
Spastic Paraparesis (100). Because of the enormous number of variants introduced from new sequencing technologies and the small sample sizes typically present, new
bioinformatic and statistical methods have been developed to reduce the dimensionality and improve the probability of detection of causal variants. Prior bioinformatic
processing may include filtering by IBD methods if family
data is present (101), or filtering based on the expected
mode of inheritance (99). In addition, if only rare variants are desired, then common variants can be filtered out
using dbSNP (102), and in addition, predicted functional
variants (nonsense, missense, splice site variants, indels,
frameshift mutations, etc.) can be discerned using tools
such as SIFT and PolyPhen (102,103). Once likely nonfunctional variants have been filtered out, new statistical
methods for summarizing the effect of multiple rare variants at a single gene can be applied. Some examples of
these methods include the cohort allelic sums test method,
which compares the number of individuals with mutations within a gene between cases and controls (104), the
combined multivariate and collapsing (105), which collapses multiple rare variants in conjunction with common
variants using multivariate analysis, methods that weight
the counts of each variant using the estimated standard
deviation of the total number of mutations (106,107), or
a method that models these weights in a flexible Bayesian
framework (108). A review of recent methods is given in
reference (109). Whole-exome and whole-genome studies (in which the contribution of noncoding regions to
disease can be assayed) are currently underway for complex (multifactorial) diseases. The next couple of years
will show if these technologies can help to fill in the gaps
from GWAS studies, termed missing heritability (93),
and in identifying causal variants underlying multifactorial diseases. In addition, new sequencing technologies

offer opportunities for functional characterization studies
such as gene expression profiling using next generation
sequencing (110), epigenetic profiling (111), and in indentifying somatic mutations occurring in cancer (112114).
12.8 INTEGRATION OF GENETIC,

GENOMIC, AND FUNCTIONAL DATA
FOR ANALYSIS OF MULTIFACTORIAL
DISEASES
Genes do not work in isolation, but instead work in
concert to carry out complex biological functions (115).
The field of Systems Biology is an emerging area of
research that aims to study these complex interactions
(116118). Following the Central Dogma of molecular
biology (DNARNAProtein), the ability to integrate
genetic data with other types of information (e.g. gene
expression, protein interaction, etc.) is essential to our
ability to understand the underlying complex etiology.
A systems biology approach to complex diseases is possible in part due to the ever increasing amount of omics
data (e.g. genomics, proteomics, metabolomics, etc.) that
is becoming available in public databases, the scientific
literature, and elsewhere. There are a wealth of methods that have been developed, or are being developed,
that are designed to utilize this wealth of data to help
prioritize and interpret results of genetic association
studies. One of the most popular types of methods is
pathway analysis, often referred to as gene set analysis, which can be broadly defined as methods that use
preexisting information about SNPs, genes, transcripts,
or other genomic information to interrogate subsets
(e.g. pathways) of genomic variables as they relate to a
complex trait or disease. These methods were first developed for genome-wide gene expression studies, but have
recently been applied to GWAS, and will have great utility in sequencing-based studies. These methods typically
examine test statistics to determine if the members of a
group of genes are enriched for association with a trait
(e.g. gene set enrichment analysis (119,120)), or to test
if the group itself is associated with the trait (e.g. gene
set ridge regression in association studies, (121,122)).
For additional information on these methods, see references (123129). Inclusion of prior network knowledge
or de novo network discovery in conjunction with pathway methods has also been shown to further improve
genomic data discoveries, with significant examples in
cancer as exemplified in (130132).
12.9 CONCLUSIONS
Genetic modeling is a challenging art and science. Advances in molecular technology, statistical
methodology,6 and increasing availability of large
6A
steady stream of videos offering tutelage on these advances can

be freely seen at: http://www.soph.uab.edu/ssg/courses/ssgseminars.
11
samples allow many new investigations to be undertaken

on unprecedented scales. Interpretation of the resulting
findings remains both difficult and one of the more exciting challenges facing todays biomedical researchers.
REFERENCES
1. Badano, J. L.; Katsanis, N. Beyond Mendel: An Evolving
View of Human Genetic Disease Transmission. Nat. Rev.
Genet. 2002 Oct, 3 (10), 779789.
2. Fisher, R. A. The Correlation between Relatives on the Supposition of Mendelian Inheritance. Trans. R. Soc. Edinb.
1918, 52, 399433.
3. Falconer, D. S. The Inheritance of Liability to Certain Diseases, Estimated from the Incidence among Relatives. Ann.
Hum. Genet. 1965 Aug, 29, 5176.
4. Reich, T.; James, J. W.; Morris, C. A. The Use of Multiple
Thresholds in Determining the Mode of Transmission of
Semi-Continuous Traits. Ann. Hum. Genet. 1972 Nov, 36
(2), 163184.
5. Chakraborty, R. The Inheritance of Pyloric Stenosis Explained
by a Multifactorial Threshold Model with Sex Dimorphism
for Liability. Genet. Epidemiol. 1986, 3 (1), 115.
6. Dronamraju, K. R.; Bixler, D.; Majumder, P. P. Fetal Mortality Associated with Cleft Lip and Cleft Palate. Johns Hopkins Med. J. 1982 Dec, 151 (6), 287289.
7. Dronamraju, K. R.; Bixler, D. Fetal Mortality in Oral Cleft
Families(IV): The Doubling Effect. Clin. Genet. 1983 Jul,
24 (1), 2225.
8. Elston, R. C.; Boklage, C. E. An Examination of Fundamental Assumptions of the Twin Method. Prog. Clin. Biol. Res.
1978, 24A, 189199.
9. Hopper, J. L. Twin Concordance. In Encyclopedia of Biostatistics; John Wiley: New York, 1998; Vol. 6, pp 46264629.
10. Karlin, S.; Cameron, E. C.; Williams, P. T. Sibling and
ParentOffspring Correlation Estimation with Variable
Family Size. Proc. Natl. Acad. Sci. U.S.A. 1981 May, 78 (5),
26642668.
11. Neale, M. C. Adoption Studies. In Encyclopedia of Biostatistics; John Wiley: New York, 1998; Vol. 1, pp 7781.
12. Neale, M. C.; Cardon, L. R. Methodology for Genetic Studies of Twins and Families; Kluwer: London, 1992.
13. Carey, G. Sibling Imitation and Contrast Effects. Behav.
Genet. 1986 May, 16 (3), 319341.
14. Lykken, D. T.; McGue, M.; Tellegen, A.; Bouchard, T. J., Jr.
Emergenesis. Genetic Traits That May Not Run in Families.
Am. Psychol. 1992 Dec, 47 (12), 15651577.
15. Risch, N. Linkage Strategies for Genetically Complex Traits.
I. Multilocus Models. Am. J. Hum. Genet. 1990 Feb, 46 (2),
222228.
16. Olson, J. M.; Cordell, H. J. Ascertainment Bias in the Estimation of Sibling Genetic Risk Parameters. Genet. Epidemiol. 2000 Mar, 18 (3), 217235.
17. Visscher, P. M.; Medland, S. E.; Ferreira, M. A.; Morley,
K. I.; Zhu, G.; Cornes, B. K.; Montgomery, G. W.; Martin, N. G. Assumption-Free Estimation of Heritability from
Genome-Wide Identity-by-Descent Sharing between Full Siblings. PLoS Genet. 2006 Mar, 2 (3), e41, Epub 2006 Mar 24.
18. Morton, N. E. Sequential Tests for the Detection of Linkage.
Am. J. Hum. Genet. 1955 Sep, 7 (3), 277318.
19. Elston, R. C.; Stewart, J. A General Model for the Genetic
Analysis of Pedigree Data. Hum. Hered. 1971, 21 (6),
523542.
20. Elston, R. C.; Rao, D. C. Statistical Modeling and Analysis
in Human Genetics. Annu. Rev. Biophys. Bioeng. 1978, 7,
253286, Review.
12
21. Lander, E. S.; Green, P. Construction of Multilocus Genetic

Linkage Maps in Humans. Proc. Natl. Acad. Sci. U.S.A.
1987 Apr, 84 (8), 23632367.
22. Haseman, J. K.; Elston, R. C. The Investigation of Linkage
between a Quantitative Trait and a Marker Locus. Behav.
Genet. 1972 Mar, 2 (1), 319.
23. Penrose, L. S. The Detection of Autosomal Linkage in Data
Which Consist of Pairs of Brothers and Sisters of Unspecified
Parentage. Ann. Eugen. 1935, 6, 133138.
24. Amos, C. I.; Elston, R. C.; Wilson, A. F.; Bailey-Wilson, J. E.
A More Powerful Robust Sib-Pair Test of Linkage for Quantitative Traits. Genet. Epidemiol. 1989, 6 (3), 435449.
25. Olson, J. M.; Wijsman, E. M. Linkage between Quantitative
Trait and Marker Loci: Methods Using All Relative Pairs.
Genet. Epidemiol. 1993, 10 (2), 87102.
26. Drigalenko, E. How Sib Pairs Reveal Linkage. Am. J. Hum.
Genet. 1998 Oct, 63 (4), 12421245.
27. Forrest, W. F. Weighting Improves the New Haseman
Elston Method. Hum. Hered. 2001, 52 (1), 4754.
28. Gerhard, D.; Hothorn, L. A. Rank Transformation in
HasemanElston Regression Using Scores for Location-Scale
Alternatives. Hum. Hered. 2010, 69 (3), 143151.
29. Sham, P. C.; Purcell, S. Equivalence between Haseman
Elston and Variance-Components Linkage Analyses for Sib
Pairs. Am. J. Hum. Genet. 2001 Jun, 68 (6), 15271532.
30. Sham, P. C.; Purcell, S.; Cherny, S. S.; Abecasis, G. R. Powerful Regression-Based Quantitative-Trait Linkage Analysis of
General Pedigrees. Am. J. Hum. Genet. 2002 Aug, 71 (2),
238253.
31. Shete, S.; Jacobs, K. B.; Elston, R. C. Adding Further Power
to the Haseman and Elston Method for Detecting Linkage
in Larger Sibships: Weighting Sums and Differences. Hum.
Hered. 2003, 55 (23), 7985.
32. Visscher, P. M.; Hopper, J. L. Power of Regression and Maximum Likelihood Methods to Map QTL from Sib-Pair and
DZ Twin Data. Ann. Hum. Genet. 2001 Nov, 65 (Pt 6),
583601.
33. Wang, T.; Elston, R. C. A Modified Revisited Haseman
Elston Method to Further Improve Power. Hum. Hered.
2004, 57 (2), 109116.
34. Wright, F. A. The Phenotypic Difference Discards Sib-Pair
QTL Linkage Information. Am. J. Hum. Genet. 1997 Mar,
60 (3), 740742.
35. Xu, X.; Weiss, S.; Xu, X.; Wei, L. J. A Unified Haseman
Elston Method for Testing Linkage with Quantitative Traits.
Am. J. Hum. Genet. 2000 Oct, 67 (4), 10251028.
36. S.A.G.E. 6.x [2010]. Statistical Analysis for Genetic Epidemiology, http://darwin.cwru.edu/sage/.
37. Kruglyak, L.; Daly, M. J.; Reeve-Daly, M. P.; Lander, E. S.
Parametric and Nonparametric Linkage Analysis: A Unified
Multipoint Approach. Am. J. Hum. Genet. 1996, 58, 1347
1363.
38. Lathrop, G. M.; Lalouel, J. M. Easy Calculations of Lod
Scores and Genetic Risks on Small Computers. Am. J. Hum.
Genet. 1984 Mar, 36, 460465.
39. Lathrop, G. M.; Lalouel, J. M.; Julier, C.; Ott, J. Strategies
for Multilocus Linkage Analysis in Humans. Proc. Natl.
Acad. Sci. U.S.A. 1984 Jun, 81, 34433446.
40. Lathrop, G. M.; Lalouel, J. M.; White, R. L. Construction
of Human Linkage Maps: Likelihood Calculations for Multilocus Linkage Analysis. Genet. Epidemiol. 1886 Feb, 3,
3952.
41. Abecasis, G. R.; Cherny, S. S.; Cookson, W. O.; Cardon, L. R.
MerlinRapid Analysis of Dense Genetic Maps Using Sparse
Gene Flow Trees. Nat. Genet. 2002 Jan, 30 (1), 97101.
42. Almasy, L.; Blangero, J. Multipoint Quantitative-Trait Linkage Analysis in General Pedigrees. Am. J. Hum. Genet. 1998
May, 62 (5), 11981211.
43. Amos, C. I. Robust Variance-Components Approach for

Assessing Genetic Linkage in Pedigrees. Am. J. Hum. Genet.
1994 Mar, 54 (3), 535543.
44. Amos, C. I.; Zhu, D. K.; Boerwinkle, E. Assessing Genetic
Linkage and Association with Robust Components of Variance Approaches. Ann. Hum. Genet. 1996 Mar, 60 (Pt 2),
143160.
45. Goldgar, D. E. Multipoint Analysis of Human Quantitative
Genetic Variation. Am. J. Hum. Genet. 1990 Dec, 47 (6),
957967.
46. Schork, N. J. Extended Multipoint Identity-by-Descent
Analysis of Human Quantitative Traits: Efficiency, Power,
and Modeling Considerations. Am. J. Hum. Genet. 1993
Dec, 53 (6), 13061319.
47. Spielman, R. S.; McGinnis, R. E.; Ewens, W. J. Transmission
Test for Linkage Disequilibrium: The Insulin Gene Region
and Insulin-Dependent Diabetes Mellitus (IDDM). Am. J.
Hum. Genet. 1993 Mar, 52 (3), 506516.
48. Tiwari, H. K.; Barnholtz-Sloan, J.; Wineinger, N.; Padilla,
M. A.; Vaughan, L. K.; Allison, D. B. Review and Evaluation
of Methods Correcting for Population Stratification with
a Focus on Underlying Statistical Principles. Hum. Hered.
2008, 66 (2), 6786.
49. Page, G. P.; George, V.; Go, R. C.; Page, P. Z.; Allison, D.
B. Are We There Yet?: Deciding When One Has Demonstrated Specific Genetic Causation in Complex Diseases and
Quantitative Traits. Am. J. Hum. Genet. 2003 Oct, 73 (4),
711719, Review.
50. Lander, E. S. The New Genomics: Global Views of Biology.
Science 1996 Oct 25, 274 (5287), 536539.
51. Lander, E. S.; Linton, L. M.; Birren, B.; Nusbaum, C.; Zody, M.
C.; Baldwin, J.; Devon, K.; Dewar, K.; Doyle, M.; FitzHugh,
W., etal. International Human Genome Sequencing Consortium Initial Sequencing and Analysis of the Human Genome.
Nature 2001 Feb 15, 409 (6822), 860921.
52. International HapMap Consortium. The International HapMap Project. Nature 2003 Dec 18, 426 (6968), 789796.
53. International HapMap Consortium A. Haplotype Map
of the Human Genome. Nature 2005 Oct 27, 437 (7063),
12991320.
54. International HapMap Consortium; Frazer, K. A.; Ballinger,
D. G.; Cox, D. R.; Hinds, D. A.; Stuve, L. L.; Gibbs, R. A.;
Belmont, J. W.; Boudreau, A.; Hardenbol, P., etal. A Second
Generation Human Haplotype Map of Over 3.1 million SNPs.
Nature 2007 Oct 18, 449 (7164), 851861.
55. International HapMap 3 Consortium; Altshuler, D. M.;
Gibbs, R. A.; Peltonen, L.; Altshuler, D. M.; Gibbs, R. A.;
Peltonen, L.; Dermitzakis, E.; Schaffner, S. F.; Yu, F., etal.
Integrating Common and Rare Genetic Variation in Diverse
Human Populations. Nature 2010 Sep 2, 467 (7311), 5258.
56. 1000 Genomes Project Consortium. A Map of Human
Genome Variation from Population-Scale Sequencing.
Nature 2010 Oct 28, 467 (7319), 10611073, Erratum in:
Nature 2011 May 26, 473 (7348), 544.
57. Stephens, J. C.; Schneider, J. A.; Tanguay, D. A.; Choi, J.;
Acharya, T.; Stanley, S. E.; Jiang, R.; Messer, C. J.; Chew,
A.; Han, J. H., etal. Haplotype Variation and Linkage Disequilibrium in 313 Human Genes. Science 2001 Jul 20, 293
(5529), 489493.
58. McCarthy, M. I.; Abecasis, G. R.; Cardon, L. R.; Goldstein,
D. B.; Little, J.; Ioannidis, J. P.; Hirschhorn, J. N. GenomeWide Association Studies for Complex Traits: Consensus,
Uncertainty and Challenges. Nat. Rev. Genet. 2008 May, 9
(5), 356369, Review.
59. Wellcome Trust Case Control Consortium Genome-Wide
Association Study of 14,000 Cases of Seven Common Diseases and 3,000 Shared Controls. Nature 2007 Jun 7, 447
(7145), 661678.

60. Risch, N.; Merikangas, K. The Future of Genetic Studies of
Complex Human Diseases. Science 1996 Sep 13, 273 (5281),
15161517.
61. Purcell, S.; Neale, B.; Todd-Brown, K.; Thomas, L.; Ferreira,
M. A.; Bender, D.; Maller, J.; Sklar, P.; de Bakker, P. I.; Daly,
M. J., etal. PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses. Am. J. Hum.
Genet. 2007 Sep, 81 (3), 559575.
62. Laurie, C. C.; Doheny, K. F.; Mirel, D. B.; Pugh, E. W.;
Bierut, L. J.; Bhangale, T.; Boehm, F.; Caporaso, N. E.; Cornelis, M. C.; Edenberg, H. J., etal. GENEVA Investigators
Quality Control and Quality Assurance in Genotypic Data
for Genome-Wide Association Studies. Genet. Epidemiol.
2010 Sep, 34 (6), 591602.
63. Turner, S.; Armstrong, L. L.; Bradford, Y.; Carlson, C. S.;
Crawford, D. C.; Crenshaw, A. T.; de Andrade, M.; Doheny,
K. F.; Haines, J. L.; Hayes, G., etal. Quality Control Procedures for Genome-Wide Association Studies. Curr. Protoc.
Hum. Genet. 2011 Jan, Chapter 1:Unit1.19.
64. Devlin, B.; Roeder, K. Genomic Control for Association
Studies. Biometrics 1999 Dec, 55 (4), 9971004.
65. Voight, B. F.; Pritchard, J. K. Confounding from Cryptic Relatedness in Case-Control Association Studies. PLoS
Genet. 2005 Sep, 1 (3), e32.
66. Bacanu, S. A.; Devlin, B.; Roeder, K. The Power of
Genomic Control. Am. J. Hum. Genet. 2000 Jun, 66 (6),
19331944.
67. Dadd, T.; Weale, M. E.; Lewis, C. M. A Critical Evaluation of Genomic Control Methods for Genetic Association
Studies. Genet. Epidemiol. 2009 May, 33 (4), 290298,
Review.
68. Devlin, B.; Roeder, K.; Wasserman, L. Genomic Control, a
New Approach to Genetic-Based Association Studies. Theor.
Popul. Biol. 2001 Nov, 60 (3), 155166, Review.
69. Devlin, B.; Bacanu, S. A.; Roeder, K. Genomic Control to the
Extreme. Nat. Genet. 2004 Nov, 36 (11), 11291130.
70. Zheng, G.; Freidlin, B.; Li, Z.; Gastwirth, J. L. Genomic Control for Association Studies Under Various Genetic Models.
Biometrics 2005 Mar, 61 (1), 186192.
71. Zheng, G.; Freidlin, B.; Gastwirth, J. L. Robust Genomic
Control for Association Studies. Am. J. Hum. Genet. 2006
Feb, 78 (2), 350356.
72. Pritchard, J. K.; Rosenberg, N. A. Use of Unlinked Genetic
Markers to Detect Population Stratification in Association
Studies. Am J Hum Genet. 1999 Jul, 65 (1), 220228.
73. Pritchard, J. K.; Stephens, M.; Rosenberg, N. A.; Donnelly,
P. Association Mapping in Structured Populations. Am. J.
Hum. Genet. 2000 Jul, 67 (1), 170181.
74. Redden, D.; Divers, J.; Vaughan, L.; Tiwari, H.; Beasley,
T.; Fernandez, J.; Kimberly, R.; Feng, R.; Padilla, M.;
Lui, N., et al. Regional Admixture Mapping and Structured Association Testing: Conceptual Unification and an
Extensible General Linear Model. PLoS Genet. 2006 Aug
25, 2 (8), e137.
75. Patterson, N.; Price, A. L.; Reich, D. Population Structure
and Eigen Analysis. PLoS Genet. 2006 Dec, 2 (12), e190.
76. Price, A. L.; Patterson, N. J.; Plenge, R. M.; Weinblatt, M.
E.; Shadick, N. A.; Reich, D. Principal Components Analysis Corrects for Stratification in Genome-Wide Association
Studies. Nat. Genet. 2006 Aug, 38 (8), 904909.
77. Pritchard, J. K.; Cox, N. J. The Allelic Architecture of
Human Disease Genes: Common DiseaseCommon Variant
or Not? Hum. Mol. Genet. 2002 Oct 1, 11 (20), 24172423.
78. Pearson, K. On the Criterion That a Given System of Deviations from the Probable in the Case of a Correlated System
of Variables is Such That It Can Be Reasonably Supposed to
Have Arisen from Random Sampling. Philos. Mag. 1900, 50
(302), 157175, Series 5.
13
79. Fisher, R. A. On the Interpretation of 2 from Contingency

Tables, and the Calculation of P. J. R. Stat. Soc. 1922, 85 (1),
8794.
80. Armitage, P. Tests for Linear Trends in Proportions and Frequencies. Biometrics 1955, 11 (3), 375386.
81. Cochran, W. G. Some Methods for Strengthening the Common Chi-Square Tests. Biometrics 1954, 10 (4), 417451.
82. Sasieni, P. From Genotypes to Genes: Doubling the Sample
Size. Biometrics 1997 Dec, 53 (4), 12531261.
83. Freidlin, B.; Zheng, G.; Li, Z.; Gastwirth, J. L. Trend Tests
for CaseControl Studies of Genetic Markers: Power,
Sample Size and Robustness. Hum. Hered. 2002, 53 (3),
146152.
84. Huang, B. E.; Lin, D. Y. Efficient Association Mapping of
Quantitative Trait Loci with Selective Genotyping. Am. J.
Hum. Genet. 2007, 80, 567576.
85. Aulchenko, Y. S.; de Koning, D. J.; Haley, C. Genome-Wide
Rapid Association Using Mixed Model and Regression: A
Fast and Simple Method for Genome-Wide Pedigree-Based
Quantitative Trait Loci Association Analysis. Genetics 2007
Sep, 177 (1), 577585.
86. Kang, H. M.; Zaitlen, N. A.; Wade, C. M.; Kirby, A.; Heckerman, D.; Daly, M. J.; Eskin, E. Efficient Control of Population Structure in Model Organism Association Mapping.
Genetics 2008 Mar, 178 (3), 17091723.
87. Zhang, Z.; Ersoz, E.; Lai, C. Q.; Todhunter, R. J.; Tiwari, H.
K.; Gore, M. A.; Bradbury, P. J.; Yu, J.; Arnett, D. K.; Ordovas, J. M., etal. Mixed Linear Model Approach Adapted for
Genome-Wide Association Studies. Nat. Genet. 2010 Apr,
42 (4), 355360.
88. Laird, N. M.; Lange, C. Family-Based Designs in the Age
of Large-Scale Gene-Association Studies. Nat. Rev. Genet.
2006 May, 7 (5), 385394, Review.
89. Hoggart, C. J.; Clark, T. G.; De Iorio, M.; Whittaker, J. C.;
Balding, D. J. Genome-Wide Significance for Dense SNP and
Resequencing Data. Genet. Epidemiol. 2008 Feb, 32 (2),
179185.
90. Balding, D. J. A Tutorial on Statistical Methods for Population Association Studies. Nat. Rev. Genet. 2006 Oct, 7 (10),
781791, Review.
91. Chakravarti, A. Population GeneticsMaking Sense Out
of Sequence. Nat. Genet. 1999 Jan, 21 (Suppl. 1), S56S60,
Review.
92. Lohmueller, K. E.; Mauney, M. M.; Reich, D.; Braverman,
J. M. Variants Associated with Common Disease Are Not
Unusually Differentiated in Frequency Across Populations.
Am. J. Hum. Genet. 2006 Jan, 78 (1), 130136.
93. Manolio, T. A.; Collins, F. S.; Cox, N. J.; Goldstein, D. B.;
Hindorff, L. A.; Hunter, D. J.; McCarthy, M. I.; Ramos, E.
M.; Cardon, L. R.; Chakravarti, A., etal. Finding the Missing Heritability of Complex Diseases. Nature 2009 Oct 8,
461 (7265), 747753, Review.
94. Terwilliger, J. D.; Hiekkalinna, T. An Utter Refutation of
the Fundamental Theorem of the HapMap. Eur. J. Hum.
Genet. 2006 Apr, 14 (4), 426437.
95. Terwilliger, J. D.; Gring, H. H. Update to Terwilliger and
Grings Gene Mapping in the 20th and 21st centuries
(2000): Gene Mapping When Rare Variants Are Common
and Common Variants Are Rare. Hum. Biol. 2009 Dec, 81
(56), 729733.
96. Bodmer, W.; Bonilla, C. Common and Rare Variants in Multifactorial Susceptibility to Common Diseases. Nat. Genet.
2008 Jun, 40 (6), 695701.
97. Zondervan, K. T.; Cardon, L. R. The Complex Interplay
among Factors That Influence Allelic Association. Nat. Rev.
Genet. 2004 Feb, 5 (2), 89100.
98. Mardis, E. R. A Decades Perspective on DNA Sequencing
Technology. Nature 2011 Feb 10, 470 (7333), 198203.
14
99. Ng, S. B.; Buckingham, K. J.; Lee, C.; Bigham, A. W.; Tabor,
H. K.; Dent, K. M.; Huff, C. D.; Shannon, P. T.; Jabs, E.
W.; Nickerson, D. A., etal. Exome Sequencing Identifies the
Cause of a Mendelian Disorder. Nat. Genet. 2010 Jan, 42
(1), 3035.
100. Erlich, Y.; Edvardson, S.; Hodges, E.; Zenvirt, S.; Thekkat,
P.; Shaag, A.; Dor, T.; Hannon, G. J.; Elpeleg, O. Exome
Sequencing and Disease-Network Analysis of a Single Family
Implicate a Mutation in KIF1A in Hereditary Spastic Paraparesis. Genome. Res. 2011 May, 21 (5), 658664.
101. Rdelsperger, C.; Krawitz, P.; Bauer, S.; Hecht, J.; Bigham,
A. W.; Bamshad, M.; de Condor, B. J.; Schweiger, M. R.;
Robinson, P. N. Identity-by-Descent Filtering of Exome
Sequence Data for Disease-Gene Identification in Autosomal
Recessive Disorders. Bioinformatics 2011 Mar 15, 27 (6),
829836.
102. Ng, S. B.; Turner, E. H.; Robertson, P. D.; Flygare, S. D.;
Bigham, A. W.; Lee, C.; Shaffer, T.; Wong, M.; Bhattacharjee, A.; Eichler, E. E., etal. Targeted Capture and Massively
Parallel Sequencing of 12 Human Exomes. Nature 2009 Sep
10, 461 (7261), 272276.
103. Sunyaev, S.; Ramensky, V.; Koch, I.; Lathe, W., III; Kondrashov, A. S.; Bork, P. Prediction of Deleterious Human
Alleles. Hum. Mol. Genet. 2001 Mar 15, 10 (6), 591597.
104. Morgenthaler, S.; Thilly, W. G. A Strategy to Discover Genes
That Carry Multi-Sllelic or Mono-Allelic Risk for Common
Diseases: A Cohort Allelic Sums test (CAST). Mutat. Res.
2007 Feb 3, 615 (12), 2856.
105. Li, B.; Leal, S. M. Methods for Detecting Associations with
Rare Variants for Common Diseases: Application to Analysis of Sequence Data. Am. J. Hum. Genet. 2008 Sep, 83 (3),
311321.
106. Madsen, B. E.; Browning, S. R. A Groupwise Association
Test for Rare Mutations Using a Weighted Sum Statistic.
PLoS Genet. 2009 Feb, 5 (2), e1000384.
107. Price, A. L.; Kryukov, G. V.; de Bakker, P. I.; Purcell, S. M.;
Staples, J.; Wei, L. J.; Sunyaev, S. R. Pooled Association
Tests for Rare Variants in Exon-Resequencing Studies. Am.
J. Hum. Genet. 2010 Jun 11, 86 (6), 832838.
108. Yi, N.; Zhi, D. Bayesian Analysis of Rare Variants in Genetic
Association Studies. Genet. Epidemiol. 2011 Jan, 35 (1),
5769.
109. Bansal, V.; Libiger, O.; Torkamani, A.; Schork, N. J. Statistical Analysis Strategies for Association Studies Involving Rare
Variants. Nat. Rev. Genet. 2010 Nov, 11 (11), 773785.
110. Ansorge, W. J. Next-Generation DNA Sequencing Techniques. Nat. Biotechnol. 2009 Apr, 25 (4), 195203.
111. Hirst, M.; Marra, M. A. Next Generation Sequencing Based
Approaches to Epigenomics. Brief Funct. Genomics 2010
Dec, 9 (56), 455465, Review.
112. Meyerson, M.; Gabriel, S.; Getz, G. Advances in Understanding Cancer Genomes through Second-Generation Sequencing. Nat. Rev. Genet. 2010 Oct, 11 (10), 685696, Review.
113. Timmermann, B.; Kerick, M.; Roehr, C.; Fischer, A.; Isau,
M.; Boerno, S. T.; Wunderlich, A.; Barmeyer, C.; Seemann,
P.; Koenig, J., etal. Somatic Mutation Profiles of MSI and
MSS Colorectal Cancer Identified by Whole Exome Next
Generation Sequencing and Bioinformatics Analysis. PLoS
One 2010 Dec 22, 5 (12), e15661.
114. Wei, X.; Walia, V.; Lin, J. C.; Teer, J. K.; Prickett, T. D.;
Gartner, J.; Davis, S.; NISC Comparative Sequencing
Program; Stemke-Hale, K.; Davies, M. A., et al. Exome
Sequencing Dentifies GRIN2A As Frequently Mutated in
Melanoma. Nat. Genet. 2011 May, 43 (5), 442446, Epub
2011 Apr 15.
115. Schadt, E. E. Molecular Metworks As Sensors and Drivers
of Common Human Diseases. Nature 2009 Sep 10, 461
(7261), 218223, Review.
116. Chuang, H. Y.; Hofree, M.; Ideker, T. A Decade of Systems

Biology. Annu. Rev. Cell Dev. Biol. 2010 Nov 10, 26, 721
744, Review.
117. Kohl, M. Standards, Databases, and Modeling Tools in Systems Biology. Methods Mol. Biol. 2011, 696, 413427.
118. Kohl, M. Development and Validation of Predictive Molecular Signatures. Curr. Mol. Med. 2010 Mar, 10 (2), 173179,
Review.
119. Subramanian, A.; Kuehn, H.; Gould, J.; Tamayo, P.;
Mesirov, J. P. GSEA-P: A Desktop Application for Gene Set
Enrichment Analysis. Bioinformatics 2007 Dec 1, 23 (23),
32513253.
120. Subramanian, A.; Tamayo, P.; Mootha, V. K.; Mukherjee,
S.; Ebert, B. L.; Gillette, M. A.; Paulovich, A.; Pomeroy, S.
L.; Golub, T. R.; Lander, E. S., etal. Gene Set Enrichment
Analysis: A Knowledge-Based Approach for Interpreting
Genome-Wide Expression Profiles. Proc. Natl. Acad. Sci.
U.S.A. 2005 Oct 25, 102 (43), 1554515550.
121. Chen, L. S.; Hutter, C. M.; Potter, J. D.; Liu, Y.; Prentice, R.
L.; Peters, U.; Hsu, L. Insights into Colon Cancer Etiology
Via a Regularized Approach to Gene Set Analysis of GWAS
Data. Am. J. Hum. Genet. 2010 Jun 11, 86 (6), 860871.
122. Chen, X.; Wang, L.; Hu, B.; Guo, M.; Barnard, J.; Zhu,
X. Pathway-Based Analysis for Genome-Wide Association
Studies Using Supervised Principal Components. Genet. Epidemiol. 2010 Nov, 34 (7), 716724.
123. Cantor, R. M.; Lange, K.; Sinsheimer, J. S. Prioritizing
GWAS Results: A Review of Statistical Methods and Recommendations for Their Application. Am. J. Hum. Genet. 2010
Jan, 86 (1), 622, Review.
124. Holmans, P. Statistical Methods for Pathway Analysis of
Genome-Wide Data for Association with Complex Genetic
Traits. Adv. Genet. 2010, 72, 141179.
125. Hong, M. G.; Pawitan, Y.; Magnusson, P. K.; Prince, J. A.
Strategies and Issues in the Detection of Pathway Enrichment
in Genome-Wide Association Studies. Hum. Genet. 2009
Aug, 126 (2), 289301.
126. Thomas, D. C.; Conti, D. V.; Baurley, J.; Nijhout, F.; Reed, M.;
Ulrich, C. M. Use of Pathway Information in Molecular Epidemiology. Hum. Genomics 2009 Oct, 4 (1), 2142, Review.
127. Wang, K.; Li, M.; Hakonarson, H. Analysing Biological
Pathways in Genome-Wide Association Studies. Nat. Rev.
Genet. 2010 Dec, 11 (12), 843854, Review.
128. Wang, L.; Jia, P.; Wolfinger, R. D.; Chen, X.; Zhao, Z. Gene
Set Analysis of Genome-Wide Association Studies: Methodological Issues and Perspectives. Genomics 2011 Jul, 98 (1),
18.
129. Wang, L.; Jia, P.; Wolfinger, R. D.; Chen, X.; Grayson, B. L.;
Aune, T. M.; Zhao, Z. An Efficient Hierarchical Generalized Linear Mixed Model for Pathway Analysis of GenomeWide Association Studies. Bioinformatics 2011 Mar 1, 27
(5), 686692.
130. Carro, M. S.; Lim, W. K.; Alvarez, M. J.; Bollo, R. J.; Zhao,
X.; Snyder, E. Y.; Sulman, E. P.; Anne, S. L.; Doetsch, F.;
Colman, H., etal. The Transcriptional Network for Mesenchymal Transformation of Brain Tumours. Nature 2010 Jan
21, 463 (7279), 318325.
131. Cerami, E.; Demir, E.; Schultz, N.; Taylor, B. S.; Sander, C.
Automated Network Analysis Identifies Core Pathways in
Glioblastoma. PLoS One 2010 Feb 12, 5 (2), e8918.
132. Torkamani, A.; Schork, N. J. Identification of Rare Cancer
DriverMutationsbyNetworkReconstruction.GenomeRes.2009
Sep, 19 (9), 15701578.
133. Satagopan, J. M.; Smith, A. D. Statistical Methods in Genomics Research. Heart Drug 2003, 3, 4860.
134. Reich, D. E.; Goldstein, D. B. Detecting Association in a
Case-Control Study While Correcting for Population Stratification. Genet. Epidemiol. 2001 Jan, 20 (1), 416, Review.
15
Biographies
hristine Duarte, PhD. Dr Duarte is an assistant professor in the Department of Biostatistics at
C
University of Alabama at Birmingham. Her current research focuses on developing statistical
methods for applying systems biology approaches to high-dimensional genomic data analysis.
Her research interests falls into three main areas: (1) prediction of complex traits and disease
from high dimensional genomic data; (2) developing network models for decomposing genetic
effects on complex traits into intermediate phenotypes; and (3) integrating cross-platform
genomic data using network modeling to explain biological processes of interest such as disease
etiology and treatment response.
aura Kelly Vaughan, PhD. Dr Vaughan is an assistant professor in the Section on Statistical
L
Genetics in the Department of Biostatistics at the University of Alabama at Birmingham. Her
research interests include integrating biological knowledge into the statistical analysis of complex genomic data, pathway analysis, and reproducible research practices.
Mark Beasley, PhD. Dr Beasley is an associate professor and has been a faculty member of
T
the UAB Department of Biostatics and the Section on Statistical Genetics (SSG) since 2001. Dr
Beasley has published over 30 articles in the area of statistical methodology, focused in five
major areas: (1) methodological problems in statistical genetics; (2) nonparametric statistics;
(3) simulation studies; (4) the use of linear models; (5) longitudinal analysis, and (6) mediation
analysis. He has also coauthored over 30 applied research articles in a variety of disciplines
including education, psychology, medicine, genetics, and gerontology, pharmacology.

emant K Tiwari, PhD. Dr Tiwari is William Student Sealy Gosset Professor in Biostatistics
H
and Interim Head of the Section on Statistical Genetics in the School of Public Health at the
University of Alabama at Birmingham. His research interests include genetic linkage analysis,
disequilibrium mapping, genome-wide association studies (GWAS), population genetics, structural variations in the genome, pharmacogenetics/pharmacogenomics, and bioinformatics. At
present, he is involved in gene mapping studies of rheumatoid arthritis, SLE, cardiovascular
diseases, and obesity-related traits.

Chapter 12 - M Ultifactorial Inheritance and Complex Diseases

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Chapter 12 - M Ultifactorial Inheritance and Complex Diseases

Загружено:

Авторское право:

Доступные форматы

CHAPTER

12.2 DEFINITIONS AND TERMINOLOGY

If a disease or condition is caused by a single locus of

allele a is said to be codominant with respect to the wild-type

2013, Elsevier Ltd. All rights reserved.

CHAPTER 12 Multifactorial Inheritance and Complex Diseases

FIGURE 12-1 Expected phenotype distribution for a trait with a

is, not epistatic, where epistatic refers to an interaction (in

FIGURE 12-2 Expected phenotype distribution for a trait with

two independently segregating causal loci of equal effect and allele

FIGURE 12-3 Expected phenotype distribution for a trait with

three independently segregating causal loci of equal effect and

they do not follow the patterns expected of Mendelian

CHAPTER 12 Multifactorial Inheritance and Complex Diseases

descriptions of these principles can be found in the following

FIGURE 12-4 Liability distribution for a multifactorial disease. To be

1. several, but not an unlimited number, of loci are

12.3 DETERMINING THE GENETIC

12.3.1 Familial Aggregation

CHAPTER 12 Multifactorial Inheritance and Complex Diseases

TA B L E 1 2 - 3 Using Twin Concordance and

n(n11 n22 n12 n21 )2

(MZ) twins share all their nuclear genes, any difference

Heritability ranges between 0 and 1, with 0 meaning

multifactorial traits. The strategy consists of comparing

where Var(G) and Var(T) are genetic and total variance,

where Var(A) is the additive genetic variance.

CHAPTER 12 Multifactorial Inheritance and Complex Diseases

where a=number of affected sibs in a sibship, s=number

12.3.2 Segregation Analysis

probability distribution for segregation of genotypes

12.3.3 Linkage Analysis

CHAPTER 12 Multifactorial Inheritance and Complex Diseases

are usually made in calculating lod scores: the mode of

alleles in an individual or in two people are defined as alleles

12.3.4 Transmission Disequilibrium

CHAPTER 12 Multifactorial Inheritance and Complex Diseases

12.4 THE INTERNATIONAL HapMap

(90 individuals), Nigeria, 30 trios from the Centre dEtude

12.5 GENOME-WIDE ASSOCIATION

12.5.1 Study Designs

CHAPTER 12 Multifactorial Inheritance and Complex Diseases

12.5.2 Quality Control

12.5.3 Sex Inconsistency

the reported sex of each individual with predicted sex by

12.5.4 Relatedness and Mendelian Errors

12.5.5 Batch Effects

12.5.6 Marker and Sample Genotyping

CHAPTER 12 Multifactorial Inheritance and Complex Diseases

12.5.7 Population Stratification

12.5.8 Marker Allele Frequency and

The rare SNPs may lead to spurious results because of

12.6 ASSOCIATION METHODS/

CHAPTER 12 Multifactorial Inheritance and Complex Diseases

12.6.2 Validation and Replication Phase

12.7 ANALYSIS OF RARE VARIANTS

technology and advances in analytical methods (discussed

CHAPTER 12 Multifactorial Inheritance and Complex Diseases

12.8 INTEGRATION OF GENETIC,

steady stream of videos offering tutelage on these advances can

samples allow many new investigations to be undertaken

CHAPTER 12 Multifactorial Inheritance and Complex Diseases

21. Lander, E. S.; Green, P. Construction of Multilocus Genetic

1. several, but not an unlimited number, of loci are

TA B L E 1 2 - 3 Using Twin Concordance and

21. Lander, E. S.; Green, P. Construction of Multilocus Genetic

43. Amos, C. I. Robust Variance-Components Approach for

79. Fisher, R. A. On the Interpretation of 2 from Contingency

116. Chuang, H. Y.; Hofree, M.; Ideker, T. A Decade of Systems