Академический Документы
Профессиональный Документы
Культура Документы
Methods: Bootstrapping
Bryce Bucknell
Jim Burke
Ken Flores
Tim Metts
Agenda
Scenario
Obstacles
Regression Model
Bootstrapping
Applications and Uses
Results
Scenario
You have been recently hired as the statistician for the University
of Notre Dame football team. You are tasked with performing a
statistical analysis for the first year of the Charlie Weis era.
Specifically, you have been asked to develop a regression model
that explains the relationship between key statistical categories
and the number of points scored by the offense. You have a
limited number of data points, so you must also find a way to
ensure that the regression results generated by the model are
reliable and significant.
Problems/Obstacles:
Sample N = 1
Sample N = 2
Sample N = 3
Sample N = 4
1. http://www.statisticalengineering.com/central_limit_theorem_(summary).htm
Initial data set is not sufficiently large enough to use simple random
sampling without replacement
Through Monte Carlo simulation we have been able to replicate the original
population
Units are sampled from the population one at a time, with each unit being
replaced before the next is sampled.
Homoscedasticity constant
variance
Residual
s
Residual
s
Hetroscedasticity vs.
Homoscedasticity
Hetroscedasticity nonconstant
variance
b1 = Total Yards
Gained
b3 = Total Plays
b2 = Penalty Yards
b4 = Turnovers
Adjusted R2 = 74.22%
How It
Works
Bootstrapping is a method for estimating the sampling
distribution of an estimator by resampling with replacement from
the original sample.
The samples are generated from the data in the original sample by
copying it many number of times (Monte Carlo Simulation)
Characteristics of Bootstrapping
Sampling
with
Replacemen
t
Full Sample
Bootstrapping Example
Original Data
Set
Pittsburgh
Limited
number of
observation
s
Random sampling
with replacement
can be employed to
create multiple
independent
samples for analysis
1st Random
Sample
Navy
Michigan
Ohio State
Michigan State
USC
Washington
Washington
Purdue
USC
BYU
Tennessee
109 Copies
of each
observatio
n
Stanford
Ohio State
USC
BYU
Stanford
Pittsburgh
Navy
Syracuse
Ohio State
Creating a
much larger
sample with
which to work
Ohio State
Stanford
Michigan
When it should be
used
Bootstrapping is especially useful in situations when no analytic
formula for the sampling distribution is available.
No equal variance
Bootstrap
Yields slightly
different results
when repeated on
the same data
(when estimating
the standard error)
Not bound to
theoretical
distributions
Jackknife
Less general
technique
Explores sample
variation differently
Yields the same
result each time
Similar data
requirements
Bootstrap
Requires a small of
data
More complex
technique time
consuming
Cross-Validation
Not a resampling
technique
Requires large
amounts of data
Extremely useful in
data mining and
artificial intelligence
Bootstrapping Results
R2 Data
Sample #
Adjusted R^2
Sample #
Adjusted R^2
0.7351
13
0.7482
0.7545
14
0.8719
0.7438
15
0.7391
0.7968
16
0.9025
0.5164
17
0.8634
0.6449
18
0.7927
0.9951
19
0.6797
0.9253
20
0.6765
0.8144
21
0.8226
10
0.7631
22
0.9902
11
0.8257
23
0.8812
12
0.9099
24
0.9169
The Mean,
Standard Dev., 95%
and 99%
confidence
intervals are then
calculated in excel
from the 24
observations
Bootstrapping Results
R2 Data
Mean:
STDEV:
Conf 95%
Conf 99%
0.8046
0.1131
0.0453 or 75.93 - 84.98%
0.0595 or 74.51 - 86.41%
Questions
?