Вы находитесь на странице: 1из 14

John Wiggins

Potentially New SNPs Associated with Familial Breast Cancer Susceptibility Loci
Abstract:
Modern Day Breast Cancer Susceptibility background info/review:
Many breakthroughs have been made in recent history in the genetic approach of
how to handle breast cancer, one of the most notable of which being the identification of
the BRCA1 and BRCA2 genes identified in early 1990s. However, even though this
breakthrough was huge in the identification of nearly 5-10% of breast cancer cases (the
cancer being caused by mutations within the two genes), that still left a massive amount
of familial breast cancer risk unexplained. Further studies since the identification of these
gene mutations, and their association with breast cancer have been established,
studied, and reviewed, however, an astounding 75% of familial breast cancer risk is still
unexplained. This establishes a conclusion that many other genes, and their associated
loci, are responsible for this heritability/risk. One of the more recent studies that
investigated into these possible unknown loci was the study, or rather conglomerate of
multiple individual studies, Genome Association Study Identifies Breast Cancer
Susceptibility Loci that examined a multitude of individuals throughout Europe and parts
of Asia. This study, which was conducted by a multitude of authors including Douglas
Eaton, began to try and construct a possible genetic map, through identification of new
SNPs and HapMap utilization in an effort to find new genetic markers associated with
familial breast cancer risk. It implemented a three-tiered study, which began with a small
number of participants and a large number of SNPs, and had each successive stage
increase in participants and decrease in SNPs in order to successfully narrow down the
possible genetic region and help specify the genetic loci at which the mutations could be
found. In order to increase the specificity and help to quantifiably prove that the data was
more significant than in the previous stage, the researchers subjected the data intake to
Cochran Armitage Score testing with one degree of freedom, as well as separate

John Wiggins
analysis of each stage so that no previous stages results could impact that particular
genetic locis association with breast cancer in a further stage. Finally, upon the
completion of stage 3, it could be seen that five particular SNPs had statistically
significant association with identifiable breast cancer. Theses five SNPs, and their
associated rs numbers are as follows: rs2981582, rs3803662, rs889312, rs13281615,
rs3817198. Additionally, these SNPs percentage association values with breast cancer
risk identification are as follows: 97%, 71%, 25%, 3%, and 1%, respectively. The
statistical identification of the association values relevance, as well as their derivation,
can be seen in the figure below:

FIGURE 1

To unpack the statistical significance of the above information, first the axes and
subsequent indicators within the graph must be analyzed. Each of the five letters
represents the five SNPs that were found with a being rs2981582, b representing
rs3803662, and then the other three SNPs in order of decreasing association value. The
x and y axes represent the per allele odds ratio and the individual study of the SNPs
association, respectively. The x axis value of per-allele odds ratio, also labeled as the

John Wiggins
OR, is a statistical measurement of association between a particular exposure, in this
case that particular SNP, and an outcome, in this case the identified associated risk of
contracting breast cancer. The odds ratio is calculated by dividing the associated p value
by one minus said p value (P/[1-P]), thereby measuring its relation to one. The
conceptual implications of the OR value are imperative and are as follows:

If the OR=1, then that particular SNP will not affect the overall outcome of that

individuals risk to breast cancer


If the OR<1, then that particular SNP will have a negative implication, or will

decrease a persons risk of breast cancer (obviously, this was not seen at all)
If the OR>1, then that particular SNP has a positive implication upon the
individuals risk of breast cancer.

It practically follows then that all five of the SNPs that were studied had a base value of
slightly over one and this value, along with the associated practical applications, were
entirely dependent upon the p-value.
It should be noted here that the statistical way in which the SNPs were funneled
and eliminated was done via P-value interpretation. Stage 1 had a p-value of less than
0.05, which is congruent with the general thought that the data is statistically different
than that of the null, the null being that the SNP analyzed had no association with breast
cancer risk. Stage 2 was a bit more stringent and required a p value of less than
2 x 10^-5, and stage 3, which is depicted above (Fig. 1), needed a p value of less than
10^-7.
Now that x-axis labeling and significance has been established, it becomes
pertinent to define each data point, or entry, and classify its significance based upon the
y-axis. Each point, or square, represents an OR value for that particular SNP, as in a,
b, etc., where as the line falling on either side of said square is descriptive of that
particular OR values associated 95% confidence interval. The confidence interval, when

John Wiggins
constricting it to the confines of this particular study, is representative of the fact that
95% of this SNPs particular OR values that associate it with breast cancer should fall
within the range of said line, the other 5% representing the fact that it cannot be
statistically differentiated from that of the null, or not being associated with breast cancer.
So, practically speaking, the most accurate and precise data finding, would be those with
smaller confidence intervals, meaning that the study did not have an incredibly large
variations of p values. To add a little bit of context to these individual OR values, it can
be seen that the top two rows of the y-axis are representative from OR values of stages
one and two, where as the following rows, such as MCCS or BCST are names of
particular studies within that region. The vertical and horizontal diamonds above the
lowest y-axis row marked TOTAL are probably the most significant pieces of data and
are the averaged ORs and confidence intervals of each specific SNP at that particular
region of study, being European or Asian, respectively.
Ultimately, the diamonds falling under the y-axis row TOTAL, which are found
by combining the Asian and European study results, are where the true practical
implications are derived from. For your convenience, Figure 1 is again given below:

John Wiggins
The immediate takeaway noticed within the total OR values is this: the higher the
percentage association of the SNP with breast cancer, the higher the higher the total OR
value. Furthermore, it can be seen that the range over which these associated OR
values is found is incredibly small, found to be around .1 when approximating from the
graph, especially when considering the highest SNP association is 97% and the lowest
SNP association is 1%. Incidentally enough, this is actually quite large when discussing
SNP significance and actually makes quite a lot of sense when taking into account the
fact that there are millions upon millions of SNPs within the human genome, and even
potentially exponentially more interactions between these SNPs leading to an ultimate
phenotypic result, such as breast cancer. In order to place figure 1s results into a more
genomic context, the gene locations of these rs numbers, and their associated SNPs,
has been listed at their respective chromosomal positions below in the table:
TABLE 1

John Wiggins
All five SNPs, as well as some additional SNPs due to the fact that this table helped to
show data for multiple studies, can be seen in the table above along with their
chromosomal location. Furthermore, the maf values of each gene location (SNP
location), which is the frequency at which the least common allele occurs within any
given population, are all above .05 (meaning that they should be targeted by the
HapMap project). These maf values, along with the corresponding OR and p value trend
data, begins to illustrate just how elaborate and interacting these possible SNPs (gene
mutations) that increase breast cancer risk truly are.
This modern study, which was really a conglomeration of a multitude of smaller
studies with a vast array of various individual participants and over a number of years,
helped to illustrate just 3.6% of the previously unknown 75% of familial breast cancer
risk genetic linkage and only a total of five SNPs of the initial 205,586 SNPs that were
examined in stage one. Although this is a vast achievement in helping to identify the
genetic mutations that result in an increased risk of breast cancer, these results also
work to illustrate how much more research needs to be done.
Potential Development for Identification of SNPs Involved in Breast Cancer Risk:
While the above study conglomerate broke incredible ground in identifying SNPs
associated with familial breast cancer risk, it also evidenced a multitude of areas that
may be improved and elaborated in order to paint a more complete picture on
identification of specific SNP association with breast cancer, as well as what can be
done to provide a better understanding of gene location on the chromosome, and that
particular genes role within the cell. First and foremost, it should be noted that none of
the newly identified loci in the mentioned study (TABLE 1) featured genes or gene
products associated that are linked with DNA repair, sex hormone synthesis, or
metabolic metabolism pathways, all three of which were commonly thought to be the
primary types of loci that were associated with the development of breast cancer.

John Wiggins
Furthermore, of the loci identified in table 1, only the FGFR2 locus (the locus which
correlated to the rs number of the SNP that had a 97% association value with being an
identifier of breast cancer) has a clear prior history with a linkage to breast cancer.
Additionally, the FGFR2 locus is commonly associated with cell growth and cell
signaling; however, 3/5 of the loci examined within this study, as well as the contributing
studies that supplement table 1, is also associated with cell growth and cell signaling.
This is incredibly fascinating because it suggests that mutations of genes at loci, which
are associated with these common cell functions, could potentially be a direct indicator
of a higher risk of breast cancer. Furthermore, four of the five SNPs associated with this
particular study (FIGURE 1) indicate a less than 75% association with breast cancer risk
identification. It should be hypothesized from this then, that the four genes defined by
their respective rs numbers, and their associated SNP mutations, act not solely in their
identification as an associated risk of increased breast cancer, but function as a
conglomerate with other unknown genetic mutations at potentially vastly different loci on
various chromosomes within the human genome to increase breast cancer risk.
Additionally, it should be possible that if the sample size of the study were increased in
number, amount of diversity, and over a longer period of time, then new and different
patterns of linkage disequilibrium would be observed, providing a more precise look at
what additional genetic mutations could be congruently functioning with these four genes
to provide a clearer and more descriptive identification of breast cancer risk.
Experimental Design to Better Understand Specific Loci Interaction:
In order to better explain and elaborate upon these four loci and their potential
association with breast cancer than the previous study, and subsequently modern day
knowledge of these SNPs, the parameters of the experiment, such as subjects,
statistical analysis, and overall guidelines must be examined.

John Wiggins
First and foremost, the issue of selecting subjects must be addressed. Genetics
is a very specific field and can have a wide array of factors, such as environment,
disease, climate, etc., shape and alter the way a human population, and its subsequent
population genome, will evolve over time. Practically speaking, this means that
depending upon a peoples geographic location in the world certain mutations and genes
will have developed differently than that of other peoples in different living conditions,
over multiple generations of course. In the context of subject selection of this study this
means that the subjects should come from a wide array of countries and climates, which
will help in neutralizing potential instances of phenocopy, where an apparent phenotype
within a population seems to be a product of a genetic mutation but is actually cause of
an environmental factor, as well as help to generate LD (linkage disequilibrium) pattern
recognition within the research so that SNP mutational findings may be more precise. In
order to best achieve this diversity it is best to complete this process of subject in
multiple stages, beginning in one primary location and then branching out from there.
Stage One: Small Population, Large Number of SNPs
In order to potentially better eliminate SNPs from further association analysis
than in the previous study, it is probably best to begin research to specific major cities,
rather than countries, to not so broadly categorize an entire country (for instance, it
would not be fair to state that the living conditions for a people in Wyoming, would be
that as the same as in New York). Furthermore, age must become a factor so that
outside variables unrelated to this particular study will not affect the data. Next, each of
the participants needs to have a recent definitive familial case of breast cancer; this is
generally accepted as two first-degree relatives (this is defined as a parent, sibling, or
child). Lastly, since the four loci being studied have already been targeted as not having
a link with DNA repair, sex hormone synthesis, or metabolic pathways, and in order to
build upon the findings that the 3/5 of the loci were associated with and cell signaling,

John Wiggins
excluding the previously cancer linked FGFR2 locus, participants with mutations at either
the BRCA1, BRCA2, or any other loci that has been extensively associated with DNA
repair, sex hormone synthesis, or metabolic pathways should be excluded. This will help
provide a better specific identification into what types of cell signaling and cell growth
gene mutations might be congruently working with one of the four SNPs to increase the
risk of inheriting breast cancer. Of course, as at all stages of this study, as well as in any
scientific study, participant selection from subjects who meet the previously defined
parameters should be entirely random to provide the least biased results possible, and
the number of experimental subjects to control subjects should be relatively equal.
Stage One Parameters: 50 experimental subjects and 50 control subjects

Experimental subject guidelines:


o Women under 40 with invasive breast cancer
o 2 first degree relatives with breast cancer
o No mutations BCRA1, BCRA2, or other major genes associated with DNA
repair, sex hormone synthesis, or metabolic pathways
Control Subjects Defined As:
o Women over 40 without cancer
Subjects Randomly Selected From (Location):
o Most heavily populated cities within each state of the United States (50
o

experimental and 50 control per state)


Total number of experimental subjects: 2500

When advancing to stage 2, much like in the previous study, the number of SNPs
examined should become more specified and the number/variance of the participants
should increase. In order to do this, it becomes essential to become less specific about
things like the specific age guidelines for the experimental group; however, the genetic
restrictions must remain in place in order to maintain the integrity of the type of
interaction for which is being searched (no DNA repair genes, metabolic pathway genes,
or sex chromosome synthesizers).
Stage Two Parameters: roughly 20000 experimental, 20000 control per region

John Wiggins

The control, experimental, as well as omission requirements holds the same as


in Stage One, however, the age guideline for the experimental group has been
increased to women under 50 with breast cancer may be considered in the

experimental group. The control group requirements remain the same.


Subjects Randomly Selected From Previous and Very Recent Combined Studies
that Encompass:
o East Coast of the United States, West Coast of the United States,
Northern Area of the United States, South Area of the United States,
Mexico, South America, and Canada (20000 control and experimental per
o

region)
Total number of experimental subjects: 140000

It is very important to note that since a defined parameter has been established at this
point due to stage one, that it is much more financially prudent to begin researching a
multitude of previous studies within these areas, combining them and ultimately
providing a clearer and more specific look at genetic loci mutations found within a
multitude of individuals in the Western Hemisphere.
Lastly to paint the clearest and most helpful statistical analysis of these four loci
and their potential relation with other gene mutations, a vast array of subjects should be
examined throughout the entirety of the genetic variation spectrum, so cases, as recent
as are available, should be examined using the same criteria as in Stage Two, however
many more cases in many different areas need to be analyzed.

Stage Three Parameters: 500000 subjects, 60 studies

All control, experimental, and omission guidelines the same as in Stage Two
Subjects Randomly Selected From:
o North America, South America, Europe, Asia, Africa, and Australia.
o Nearly 85,0000 subjects per region
o Total experimental subjects: 500000

10

John Wiggins
Obviously, this will be incredibly difficult to achieve. Much like the previous study at
Stage Three, at this stage data from individuals will need to be taken exclusively from
prior studies throughout each of these regions. In order to achieve the desired subject
mark roughly 60 studies will need to be combined and analyzed, result by result, to
make substantial headway into identification of the association of gene mutation
interactions with these four loci and their respective consequences on breast cancer.
While this mass genotyping might seems incredibly cost ineffective and time consuming,
it is nearly a necessity for the quickest possible analysis and will also help to provide
insight in a multitude of other areas, such as HapMap identification advancement, many
new loci mutation SNPs and their respective mathematical association with breast
cancer risk, and overall modernization of a continental breakdown of where each
particular region of peoples is at genetically speaking.
The most advantageous part of this study when compared to its predecessor is
that of effective SNP genotyping, more importantly, genotypic association between
seemingly unrelated SNP activity and the potential shared phenotypic consequence from
the four initial SNPs that appeared to show association to breast cancer identification.
The initial genotyping of all genotypes within stage one that are studied should be
tagged using the HapMap phase II as a reference. The previous study yielded just fewer
than 50% of tagged SNPs was found on the HapMap. Hopefully, the drastically
increased sample size of this study will yield a higher amount of tagged SNPs and a
lower amount of SNPs found on the HapMap, any tagged SNP not on the HapMap has
the potential to be a new genetic mutation associated with the activity of the four known
SNPs and their combined link to breast cancer. The final step of stage one is to establish
which of these tagged SNPs that was not identified on the HapMap are surrogates,
SNPs that fall into perfect linkage disequilibrium based on genotyping of specific
individuals.

11

John Wiggins
One of the biggest potential fallbacks of the previous study was that all
individuals that served as surrogates with which to check these potentially new
significant SNPs were Caucasian. While this might not play the largest factor in SNP
analysis, it could potentially lead to careless neglect of an SNP that would affect
someone of a different ethnic background.

It is critical that stage one of this study implement multiple races of


individuals when checking for LD patterns

The statistical analysis of stages two and three, although complex in calculation,
have relatively simple conceptual meanings. Once the initial study has concluded, and
all tagged SNPs (MAF >5%) that were not genotyped by the HapMap and had passed
surrogate testing of the LD genotyping of select individuals, all that was left in the
analysis of the three stages was statistical separation of the mutated SNP from the null
hypothesis that it had no correlation with breast cancer association or with activity
between itself and the four initial loci. This is done by each stage having a lower p value,
and therefore more accurate implications that it is involved with breast cancer or the
activity of the four original loci, that separates the numerical value of said mutation from
that on a non-mutation. Lastly, by utilizing a combined adjustment factor (this helps to
prove difference from the null via Chi Square analysis) along with genomic control
method 22 on those SNPs whose mutations have proved a significantly low p value,
even after stage three analysis, it can be reasonably confirmed what loci are interacting
with the four initial loci, and what new loci have a potential association to breast cancer.
Utilizing the same p value requirements for each stage as in the previous study, each
individual locus mutation can be shown to have an individual risk with breast cancer, as
well as making a comparison to each one of the four initial SNPs and evaluating possible
genetic linkage. One of the largest benefits in this study over the previous one, however,

12

John Wiggins
is the fact that the sheer amount of sample size will help provide multiple loci that meet
the p<10^-7 requirements of the study and hopefully shed more light on higher linked
associations.
Expected Results and Given Limitations as well as Possible Solutions:
Ultimately, using mathematical ratio calculation from the previous study one
should expect the following pertinent results after Stage Three genotypic and statistical
analysis assuming the same ratio of SNPs tagged to HapMap identification and LD
association (using a scaled number of genotyped individuals from which LD patterns
may be observed):

Assuming 500,000 experimental participants, a theoretical 50 million potential


SNPs could be tagged. Of those tagged, assuming that the same percentage of
loci containing those SNPs has the potential to surpass stage three statistical
analyses and still contain significant association values, it should be expected
that 119 potentially new loci would have some association with increased

breast cancer risk.


Limitations: Obviously, some of the same limitations from the previous study
apply to this study as well. Many of the newly identified loci will bring up
questions of association with other lowly associated SNPs and even though the
sample size was much greater in this situation, additional studies will need to be
performed to help achieve the ultimate goal of individualizing breast cancer to the
individual.

To specifically address the issues within this particular study, assuming the expected
outcome and actual outcome are statistically different, it would be most pertinent to
first assess the fact that simply applying mathematical ratios to human genetic
diseases is not always the most accurate solution. Factors such as multiple loci
contributing the phenotypic result of breast cancer, easily missed SNPs from the

13

John Wiggins
mass amount of analysis that has to take place (it falls that there would be a much
higher number of missed SNPs in this study than in the original due to the greater
amount of analysis), as well as a multitude of other issues, could all alter the overall
number of new genetic loci mutation association values, such as the OR score,
confidence interval, even the adjustment factor, significantly. Ultimately, much more
research is still necessary to help predict the involvement of the many various
genetic mutations, as well as its combination with environmental factors, to produce
a clear and well-defined precise picture of breast cancer risk identification; however
this study helps to identify a multitude of many other mutation loci that are
associated with breast cancer identification and begins to establish possible
connections of multiple locutions of lower association percentage SNPs interacting
with one another to increase breast cancer risk.
Acknowledgements:
All credit for the previous study should go to the woaprk done by Douglas Eaton
and company, along with all associated citations found within said work, in the
publication of the study, Genome Association Study Identifies Breast Cancer
Susceptibility Loci. The citation of which is found on the next page.

Easton DF, Pooley KA, Dunning AM, et al. Genome-wide association study identifies
novel breast cancer susceptibility loci. Nature. 2007;447(7148):1087-1093.
doi:10.1038/nature05887.

14

Вам также может понравиться