Академический Документы
Профессиональный Документы
Культура Документы
STATISTICS
Course
Coursenotes:
notes:
Descriptive
Descriptive
statistics
statistics
Distributions
2
𝑁~(𝜇, 𝜎 )
N stands for normal;
~ stands for a distribution;
μ is the mean;
𝜎 2 is the variance.
The Normal Distribution
Controlling for the standard deviation
Origin
𝝁 = 𝟒𝟕𝟎 𝝁 = 𝟕𝟒𝟑 𝝁 = 𝟗𝟔𝟎
The Normal Distribution
Controlling for the mean
𝝈 = 𝟐𝟏𝟎
Origin
𝝁 = 𝟕𝟒𝟑
𝝁 = 𝟕𝟒𝟑
𝝁 = 𝟕𝟒𝟑
The Standard Normal Distribution
The Central Limit Theorem (CLT) is one of the greatest statistical insights. It states that no matter the underlying distribution of the
dataset, the sampling distribution of the means would approximate a normal distribution. Moreover, the mean of the sampling
distribution would be equal to the mean of the original distribution and the variance would be n times smaller, where n is the size of the
samples. The CLT applies whenever we have a sum or an average of many variables (e.g. sum of rolled numbers when rolling dice).
➢ No matter the distribution The CLT allows us to assume normality Since many concepts and events are a
for many different variables. That is very sum or an average of different effects,
CLT applies and we observe normality
➢ The distribution of 𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 , useful for confidence intervals, all the time. For example, in regression
… , 𝑥𝑘 would tend to 𝑁~ μ,
𝜎2 hypothesis testing, and regression analysis, the dependent variable is
𝑛 analysis. In fact, the Normal distribution explained through the sum of error
is so predominantly observed around terms.
➢ The more samples, the closer to us due to the fact that following the
Normal ( k -> ∞ ) CLT, many variables converge to
Normal.
➢ The bigger the samples, the closer
to Normal ( n -> ∞ )
Click here for a CLT simulator.
Estimators and Estimates
Estimators Estimates
Broadly, an estimator is a mathematical function that An estimate is the output that you get from the estimator
approximates a population parameter depending only on (when you apply the formula). There are two types of
sample information. estimates: point estimates and confidence interval
estimates.
Examples of estimators and the corresponding parameters:
Term Estimator Parameter
Mean ഥ
𝒙 μ Point Confidence
estimates intervals
Variance 𝒔𝟐 𝝈𝟐
• Bias • 1 • (1,5)
The expected value of an unbiased estimator is the population • 5 • ( 12 , 33)
parameter. The bias in this case is 0. If the expected value of an • 122.67 • ( 221.78 , 745.66)
estimator is (parameter + b), then the bias is b. • 0.32 • ( - 0.71 , 0.11 )
• Efficiency Confidence intervals are much more precise than point
The most efficient estimator is the one with the smallest estimates. That is why they are preferred when making
variance. inferences.
Confidence Intervals and the Margin of Error
Interval start Point estimate Interval end
Definition: A confidence interval is an interval within which we are confident (with a certain percentage of confidence) the population parameter will fall.
(1-α) is the level of confidence. We are (1-α)*100% confident that the population parameter will fall in the specified interval. Common alphas are: 0.01, 0.05, 0.1.
General formula:
Term Effect on width of CI
[𝒙
ത - ME, 𝒙
ത + ME ] , where ME is the margin of error.
𝒛𝜶/𝟐 ∗
𝝈
(1-α) ↑ ↑
𝒏
ME = reliability factor ∗
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
𝒔
𝝈↑ ↑
𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒 𝒕υ,𝜶/𝟐 ∗
𝒏 n↑ ↓
Student’s T Distribution
The Student’s T distribution is used All else equal, the Student’s T distribution
predominantly for creating confidence has fatter tails than the Normal distribution
intervals and testing hypotheses with and a lower peak. This is to reflect the
normally distributed populations when higher level of uncertainty, caused by the
the sample sizes are small. It is small sample size.
particularly useful when we don’t have
enough information or it is too costly to
obtain it.
Student’s T distribution
Normal distribution
A random variable following the t-distribution is denoted 𝑡υ,α , where υ are the degrees of freedom.
ҧ
𝑥−𝜇
We can obtain the student’s T distribution for a variable with a Normally distributed population using the formula: 𝑡υ,α = 𝑠/ 𝑛
Formulas for Confidence Intervals
Population
# populations Samples Statistic Variance Formula
variance
σ
One known - z 𝜎2 xത ± zαΤ2
n
s
One unknown - t 𝑠2 xത ± 𝑡𝑛−1,αΤ2
n
𝑠𝑑
Two - dependent t 2
𝑠𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 dത ± 𝑡𝑛−1,αΤ2
n
σ2𝑥 σ2𝑦
Two Known independent z σ2𝑥 , σ2𝑦 (𝑥ҧ − 𝑦)
ത ± 𝑧𝛼Τ2 +
𝑛𝑥 𝑛𝑦
𝝈
𝒛𝜶/𝟐 ∗
unknown, 2
𝒏𝑛𝑥 − 1 𝑠𝑥2 + 𝑛𝑦 − 1 𝑠𝑦2 𝑠𝑝2 𝑠𝑝2
𝑠𝑝 = (𝑥ҧ − 𝑦)
ത ± 𝑡𝑛𝑥 +𝑛𝑦 −2,𝛼Τ2 +
Two assumed independent t 𝑛𝑥 + 𝑛𝑦 − 2 𝑛𝑥 𝑛𝑦
equal 𝒔
𝒕𝒅.𝒇.,𝜶/𝟐 ∗
unknown,
𝒏
𝑠𝑥2 𝑠𝑦2
Two assumed independent t 𝑠𝑥2 , 𝑠𝑦2 (𝑥ҧ − 𝑦)
ത ± 𝑡υ,𝛼Τ2 +
different 𝑛𝑥 𝑛𝑦