Wei Wei - Application of Naive Bayes Model

The Application of
Naive Bayes Model Averaging

to Predict Alzheimer͛s Disease
from Genome-Wide Data
Wei Wei, Shyam Visweswaran and Gregory F. Cooper
m

@

° Genome-wide association studies (GWASs)
° Single-nucleotide polymorphism (SNP)
° High-throughput genotyping technologies
° Alzheimer͛s disease (AD):
° AD afflicts about 10% of persons over 65 and
almost half of those over 85
° ~5.5 million cases currently in U.S.
° 95% of all AD cases are Late-Onset AD (LOAD)

° Source
TGEN dataset by Reiman et al *
° Cases
° 1411 individuals
° 861 LOAD and 550 controls
° SNPs
° 312,316 SNPs
° Two additional SNPs (rs429358 and rs7412) genotyped
separately (these determine APOE status)
____________________________________________________________________
* Reiman E, Webster J, Myers A, Hardy J, Dunckley T, Zismann V, et al. GAB2 alleles
modify Alzheimer's risk in APOE epsilon4 carriers. Neuron. 2007;54(5):713-20.

° Bayesian Model Averaging
° Represents uncertainty about the correctness of
any given model
° Performs inference by weighting the prediction of
each model by our uncertainty in that model
° Model-Averaged Naïve Bayes (MANB)
MANB efficiently averages over all naive Bayes
models (on a given set of variables) in making a
prediction for an individual patient case
@
LOAD
SNP 1 SNP 2 SNP 3 ͙ SNP

312318
@ ! !
Perform feature selection using a greedy, forward-stepping
search that optimizes the prediction of LOAD
LOAD
SNP SNP SNP SNP

25,920 276,455 104,582 1,100
@ @ "@"
LOAD
SNP 1 SNP 2
͙ SNP
312,318
@ @"
Model 1 ͙ Model i ͙ Model

1 , 1
*
ÿ ÿ

6*

6
6 66
@ @"
° We can take advantage of the conditional independence
relationships in NB models to make it efficient to model
average over all those many models.
° The computational ͞trick͟ is as follows*
° For each O we construct a model-averaged conditional
probability, (O | ), by averaging over whether or
not there is an arc from to O
This step can be viewed

as a ͞soft͟ form of
feature selection.
____________________________________________________________________
* Dash D, Cooper G. Exact model averaging with naive Bayesian classifiers.
International Conference on Machine Learning (2002) 91 - 98.
@ @"
° We can take advantage of the conditional independence
relationships in NB models to make it efficient to model
average over all those many models.
° The computational ͞trick͟ is as follows*
° For each O we construct a model-averaged conditional
probability, (O | ), by averaging over whether or not
there is an arc from to O
° We use these model-averaged conditional probabilities to define a
new NB model M over which we now perform NB inference.
° Performing inference with M is the same as model averaging over
the exponential number of NB models discussed previously.
____________________________________________________________________
* Dash D, Cooper G. Exact model averaging with naive Bayesian classifiers.
International Conference on Machine Learning (2002) 91 - 98.
@
° Structure priors
° FSNB and MANB assume each arc is present with some
probability , independent of the status of other arcs in
the model.
° Informed by the literature, we chose a value of that
yields an expected number of arcs of 20.
° Parameter priors
If we think of (O |) as defining a table of
probabilities, then we assume that every way of filling in
that table (consistent with the axioms of probability) is
equally likely

@ #$
° Five-fold cross-validation
° Performance measures
° Area under the ROC curve (AUC) as a measure of
discrimination
° Calibration plots and Hosmer-Lemeshow goodness-of-
fit statistics
° Run time
° Control algorithms
° NB
° FSNB
º º
% %
2000
1684.2
1500
MANB
1000
NB
500 FSNB
16.1 15.6
0
MANB NB FSNB
Machine parameters: CPU 2.33 GHz, RAM 2 GB. Training time

was the average over the five cross-validation folds. Time for loading
data into memory is not included, but was about XYZ seconds.
º " º "m

"m
!
@"(95%
confidence interval of
their AUC difference is
-0.008 to 0.029). Their
performance is strongly
influenced by several
APOE SNPs.
"m
@"

(p<0.00001).
º

with
almost all the test
cases having
probability
predictions near 0 or
1. Such extreme
predictions occur
because there are
such a large number
of features in the
model.
º
!

!
algorithm
among the three we
evaluated. This result
is likely due to the
FSNB models
containing only a few
SNP features (< 4).
º
& ! @"

@"
'
@"
!'We
believe this result may
be due to FSNB having
such a small number of
features in its models.
!
º
! @"
"m Ë Ë
ËË Ë
º ËË ËË
" "
A full description of the MANB algorithm is available

in the appendix of our paper.
It provides all the details needed to readily
implement the algorithm.
(
° Apply the MANB algorithm to additional
datasets
° Predict additional clinical outcomes
° Use both genomic and clinical data to predict
clinical outcomes
° Explore the use of additional genome-wide
measurement platforms, including next
generation sequencing data
° Include additional control algorithms in future
evaluations
"
° We thank Mr. Kevin Bui for his help in data

preparation, software development, and the preparation
of the appendix. We thank Dr. Pablo Hennings-
Yeomans, Dr. Michael Barmada, and the other members
of our research group for helpful discussions.
° The research reported here was funded by NLM grant
R01-LM010020 and NSF grant IIS-0911032.
Thank you
Questions?

Wei Wei - Application of Naive Bayes Model

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Wei Wei - Application of Naive Bayes Model

Загружено:

Авторское право:

Доступные форматы

The Application of

Naive Bayes Model Averaging

SNP 1 SNP 2 SNP 3 ͙ SNP

SNP SNP SNP SNP

Model 1 ͙ Model i ͙ Model

This step can be viewed

Machine parameters: CPU 2.33 GHz, RAM 2 GB. Training time

A full description of the MANB algorithm is available

° We thank Mr. Kevin Bui for his help in data

Вам также может понравиться

Wei Wei - Application of Naive Bayes Model

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Wei Wei - Application of Naive Bayes Model

Загружено:

Авторское право:

Доступные форматы

The Application of

Naive Bayes Model Averaging

SNP 1 SNP 2 SNP 3 ͙ SNP

SNP SNP SNP SNP

Model 1 ͙ Model i ͙ Model

This step can be viewed

Machine parameters: CPU 2.33 GHz, RAM 2 GB. Training time

 A full description of the MANB algorithm is available

° We thank Mr. Kevin Bui for his help in data

Вам также может понравиться

A full description of the MANB algorithm is available