Вы находитесь на странице: 1из 3

1 011333 0 56677

2 etc 1
Statistics Cheat Sheet q. Mean: x = ∑xi / n
Mr. Roth, Mar 2004 r. Median: M: If odd – center, if even - mean of 2
1. Fundamentals s. Boxplot:
Min Q1 M Q3 Max
a. Population – Everybody to be analysed
 Parameter - # summarizing Pop

b. Sample – Subset of Pop we collect data on


Variance: s 2 = ∑( x − x ) /( n −1) = SS x /( n −1) ,
2
t.
 Statistics - # summarizing Sample

c. Quantitative Variables – a number u. p78: standard deviation, s = √s2


 Discrete – countable (# cars in family)
v. SS x = ∑( x − x ) 2 = ∑ x 2 − (∑ x ) 2 / n
 Continuous – Measurements – always #

between w. Density curve – relative proportion within classes –


d. Qualitative area under curve = 1
 Nominal – just a name
x. Normal Distribution: 68, 95, 99.7 % within 1, 2, 3 std
 Ordinal – Order matters (low, mid, high)
deviations.

Choosing a Sample y. p98: z-score z = ( x − x ) / s or ( x − µ) / σ


z. Standard Normal: N(0,1) when N(μ,σ)
• Sample Frame – list of pop we choose sample from
• Biased – sampling differs from pop characteristics. 3. Bivariate - Scatterplots & Correlation
• Volunteer Sample – any of below three types may a. Explanatory – independent variable
end up as volunteer if people choose to respond. b. Response – dependent variable
Sample Designs c. Scatterplot: form, direction, strength, outliers
e. Judgement Samp: Choose what we think represents d. – form is linear negative, …
 Convenience Sample – easily accessed people e. – to add categorical use different color/symbol
f. Probability Samp: Elements selected by Prob f. p147: Linear Correlation- direction & strength of
 Simple random sample – every element = linear relationship
chance g. Pearsons Coeff: {-1 ≤ r ≤ 1} 1 is perfectly linear +
 Systematic sample – almost random but we slope, -1 is perfectly linear – slope.
choose by method 1 (x − x) ( y − y) SS xy
g. Census – data on every everyone/thing in pop h. r= *∑ = ,
n −1 sx sy SS x SS y
Stratified Sampling
i. r = zxzy / (n - 1),
Divide pop into subpop based upon characteristics
h. Proportional: in proportion to total pop j. SS xy = ∑xy −
∑x ∑ y
n
i. Stratified Random: select random within substrata
j. Cluster: Selection within representative clusters 4. Regression
Collect the Data k. least squares – sum of squares of vertical error
minimized
k. Experiment: Control the environment 
l. p154: y = b0 + b1x, or y = a + bx ,
l. Observation:
m. (same as y = mx + b)
2. Single Variable Data - Distributions
m. Graphing Categorical: Pie & bar chart) n. b1 =
∑( x − x )( y − y ) = SS xy
= r (sy / sx)
n. Histogram (classes, count within each class) ∑( x − x ) 2
SS x

o. – shape, center, spread. Symmetric, skewed right, o. Then solving knowing lines thru centroid (
skewed left ( x , y ); a = y −bx
p. Stemplots
0 11222 0 112233 p. b0 =
∑ y − (b ∑ x)1

n
Statistics Cheat Sheet
q. r^2 is proportion of variation described by linear c. 2) Theoretical: Relative frequency/proportion of a
relationship given event given all possible outcomes (Sample

r. residual = y - y = observed – predicted. Space)
s. Outliers: in y direction -> large residuals, in x d. Event: outcome of random phenomenon
direction -> often influential to least squares line. e. n(S) – number of points in sample space
t. Extrapolation – predict beyond domain studied f. n(A) – number of points that belong to A
u. Lurking variable g. p 183: Empirical: P'(A) = n(A)/n = #observed/
v. Association doesn't imply causation #attempted.
h. p 185: Law of large numbers – Exp -> Theoret.
5. Data – Sampling
i. p. 194: Theoretical P(A) = n(A)/n(S) ,
a. Population: entire group favorable/possible
b. Sample: part of population we examine j. 0 ≤ P(A) ≤ 1, ∑ (all outcomes) P(A) = 1
c. Observation: measures but does not influence k. p. 189: S = Sample space, n(S) - # sample points.
response Represented as listing {(, ), …}, tree diagram, or grid
d. Experiment: treatments controlled & responses l. p. 197 Complementary Events P(A) + P( A ) = 1
observed
m. p200: Mutually exclusive events: both can't happen
e. Confounded variables (explanatory or lurking) when
at the same time
effects on response variable cannot be distinguished
n. p203. Addition Rule: P(A or B) = P(A) + P(B) – P(A
f. Sampling types: Voluntary response – biased to
and B) [which = 0 if exclusive]
opinionated, Convenience – easiest
o. p207: Independent Events: Occurrence (or not) of A
g. Bias: systematically favors outcomes
does not impact P(B) & visa versa.
h. Simple Random Sample (SRS): every set of n
p. Conditional Probability: P(A|B) – Probability of A
individuals has equal chance of being chosen
given that B has occurred. P(B|A) – Probability of B
i. Probability sample: chosen by known probability given that A has occurred.
j. Stratified random: SRS within strata divisions q. Independent Events iff P(A|B) = P(A) and P(B|A) =
k. Response bias – lying/behavioral influence P(B)
6. Experiments r. Special Multiplication. Rule: P(A and B) = P(A)*P(B)
a. Subjects: individuals in experiment s. General mult. Rule: P(A and B) = P(A)*P(B|A) =
P(B)*P(A|B)
b. Factors: explanatory variables in experiment
t. Odds / Permutations
c. Treatment: combination of specific values for each
factor u. Order important vs not (Prob of picking four
numbers)
d. Placebo: treatment to nullify confounding factors
v. Permutations: nPr, n!/(n – r)! , number of ways to
e. Double-blind: treatments unknown to subjects &
pick r item(s) from n items if order is important :
individual investigators
Note: with repetitions p alike and q alike = n!/p!q!.
f. Control Group: control effects of lurking variables
w. Combinations: nCr, n!/((n – r)!r!) , number of ways
g. Completely Randomized design: subjects allocated to pick r item(s) from n items if order is NOT
randomly among treatments important
h. Randomized comparative experiments: similar x. Replacement vs not (AAKKKQQJJJJ10) (a) Pick an
groups – nontreatment influences operate equally A, replace, then pick a K. (b) Pick a K, keep it, pick
i. Experimental design: control effects of lurking another.
variables, randomize assignments, use enough y. Fair odds - If odds are 1/1000 and 1000 payout. May
subjects to reduce chance take 3000 plays to win, may win after 200.
j. Statistical signifi: observations rare by chance
8. Probability Distribution
k. Block design: randomization within a block of
individuals with similarity (men vs women) a. Refresh on Numb heads from tossing 3 coins. Do
grid {HHH,….TTT} then #Heads vs frequency
7. Probability & odds chart{(0,1), (1,3), (2,3), (4,1)} – Note Pascals triangle
a. 2 definitions: b. Random variable – circle #Heads on graph above.
b. 1) Experimental: Observed likelihood of a given "Assumes unique numerical value for each outcome
outcome within an experiment in sample space of probability experiment".

51268775.doc -2- Printed 2/9/2011


Statistics Cheat Sheet
c. Discrete – countable number a. Statistical Inference: methods for inferring data
d. Continuous – Infinite possible values. about population from a sample
e. Probability Distribution: Add next to coins frequency b. If x is unbiased, use to estimate μ
chart a P(x) with 1/8, 3/8, 3/8, 1/8 values c. Confidence Interval: Estimate+/- error margin
f. Probability Function: Obey two properties of prob. (0 d. Confidence Level C: probability interval captures
≤ P(A) ≤ 1, ∑ (all outcomes) P(A) = 1. true parameter value in repeated samples
g. Parameter: Unknown # describing population e. Given SRS of n & normal population, C confidence
h. Statistic: # computed from sample data interval for μ is: x ± z * σ / n
Sample Population f. Sample size for desired margin of error – set +/-
Mean x μ - mu value above & solve for n.
2
Variance s σ2
Standard 12. Tests of significance
s σ - sigma
deviation g. Assess evidence supporting a claim about popu.

Base: x = ∑x / n , s 2 = ∑
(x − x)
2
h. Idea – outcome that would rarely happen if claim
i. were true evidences claim is not true
( n −1)
i. Ho – Null hypothesis: test designed to assess
Frequency Dist Probability Distribution
evidence against Ho. Usually statement of no effect
Me x = ∑xf / ∑ f µ = ∑[ xP ( x )]
an j. Ha – alternative hypothesis about population
σ = ∑[( x − µ) P ( x )] parameter to null
Var
∑( x − x ) f
2 2 2

s2 = k. Two sided: Ho: μ = 0, Ha: μ ≠ 0


(∑ f −1)
l. P-value: probability, assuming Ho is true, that test
Std s = √s2 σ = σ2 statistic would be as or more extreme (smaller P-
Dv value is > evidence against Ho)
j. Probability acting as an f / ∑f . Lose the -1 x −µ
m. z=
9. Sampling Distribution σ/ n
a. By law of large #'s, as n -> population, x → µ n. Significance level α : if α = .05, then happens no
more than 5% of time. "Results were significant (P
b. Given x as mean of SRS of size n, from pop with μ
< .01 )"
and σ. Mean of sampling distribution of x is μ and
o. Level α 2-sided test rejects Ho: μ = μo when uo falls
standard deviation is σ / n outside a level 1 – α confidence int.
c. If individual observations have normal distribution a. Complicating factors: not complete SRS from
N(μ,σ) – then x of n has N(μ, σ / n ) population, multistage & many factor designs,
d. Central Limit Theorem: Given SRS of b from a outliers, non-normal distribution, σ unknown.
population with μ and σ. When n is large, the b. Under coverage and nonresponse often more
sample mean x is approx normal. serious than the random sampling error accounted
for by confidence interval
10. Binomial Distribution c. Type I error: reject Ho when it's true – α gives
a. Binomial Experiment. Emphasize Bi – two possible probability of this error
outcomes (success,failure). n repeated identical d. Type II error: accept Ho when Ha is true
trials that have complementary P(success) +
e. Power is 1 – probability of Type II error
P(failure) = 1. binomial is count of successful trials
where 0≤x≤n
b. p : probability of success of each observation
c. Binomial Coefficient: nCk = n!/(n – k)!k!
n  k n −k
d. Binomial Prob: P(x = k) =   p (1 − p )
k 
e. Binomal μ = np
f. Binomal σ = np (1 − p )

11. Confidence Intervals

51268775.doc -3- Printed 2/9/2011