Академический Документы
Профессиональный Документы
Культура Документы
Statistics - Definition
The scientific study of numerical data based on variation in nature. (Sokal and Rohlf) A set of procedures and rules for reducing large masses of data into manageable proportions allowing us to draw conclusions from those data. (McCarthy)
Basic Terms
Measurement assignment of a number to something Data collection of measurements Sample collected data Population all possible data Variable a property with respect to which data from a sample differ in some measurable way.
Types of Measurements
Ordinal rank order, (1st,2nd,3rd,etc.) Nominal categorized or labeled data (red, green, blue, male, female) Ratio (Interval) indicates order as well as magnitude. An interval scale does not include zero.
Types of Variables
Independent Variable controlled or manipulated by the researcher; causes a change in the dependent variable. (x-axis) Dependent Variable the variable being measured (y-axis) Discreet Variable has a fixed value Continuous Variable - can assume any value
Types of Statistics
Descriptive used to organize and describe a sample Inferential used to extrapolate from a sample to a larger population
Descriptive Statistics
Measures of Central Tendency
- Mean (average) - Median (middle) - Mode (most frequent)
Age
Rank
10
11
35 30 25 20 15 10 5 0
Age
25 24 23 22 21 20
Age
12
Visit_6
Visit 6
Visit 1
13
Descriptive Statistics
Data can usually be characterized by a normal distribution. Central tendency is represented by the peak of the distribution. Dispersion is represented by the width of the distribution.
14
Descriptive Statistics
Normal distribution f(x) = (1/(*(2*)))*exp[-(1/2)*((x-)/)^2]
Mean Median Mode
Frequency
Standard Deviation
Measurement
15
Descriptive Statistics
Skew Distribution
Mode
Frequency
Measurement
16
Descriptive Statistics
Standard Deviations
Frequency
= .5 = 1.0 = 1.5
Measurement
17
Inferential Statistics
Can your experiment make a statement about the general population? Two types 1. Parametric Interval or ratio measurements Continuous variables Usually assumes that data is normally distributed 2. Non-Parametric Ordinal or nominal measurements Discreet variables Makes no assumption about how data is distributed
18
Inferential Statistics
Null Hypothesis
Statistical hypotheses usually assume no relationship between variables. There is no association between eye color and eyesight. If the result of your statistical test is significant, then the original hypothesis is false and you can say that the variables in your experiment are somehow related.
19
Statistical Result for Null Hypothesis Accepted Actual Null Hypothesis TRUE Correct FALSE Type II Error Rejected Type I Error Correct
Unfortunately, and cannot both have very small values. As one decreases, the other increases.
20
Frequency
/2
/2
Measurement
21
/2
/2
1-
22
23
24
25
Inferential Statistics
Effect Size
Detectable difference in means / standard deviation Dimensionless ~ 0.2 small (low power) ~ 0.5 medium ~ 0.8 large (powerful test)
26
27
28
T-Test Example
Group Treadmill 1 - healthy time 2 - diseased in seconds 1 1014 1 684 1 810 1 990 1 840 1 978 1 1002 1 1110 2 864 2 636 2 638 2 708 2 786 2 600 2 1320 2 750 2 594 2 750 Group1 mean 928.5 StDev 138.121 Group2 mean 764.6 StDev 213.75
Example data from SPSS Null Hypothesis There is no difference between healthy people and people with coronary artery disease in time spent on a treadmill.
1000
healthy diseased
800
600
400
200
29
30
F Treadmill time Equal variances in seconds assumed Equal variances not assumed .137
Sig. .716
t 1.873
df 16
Mean Std. Error Sig. (2-tailed) Difference Difference .080 .068 163.900 163.900 87.524 83.388
1.966 15.439
Null hypothesis is accepted because the results are not significant at the 0.05 level.
31
32
Non-Parametric Statistics
Makes no assumptions about the population from which the samples are selected. Used for the analysis of discreet data sets. Also used when data does not meet the assumptions for a parametric analysis (small data sets).
33
Non-Parametric Example I
Mann-Whitney Most commonly used as an alternative to the independent samples T-Test.
Test Statistics b Treadmill time in seconds 15.000 70.000 -2.222 .026 .027
a
34
Post-Hoc test
35
ANOVA - Example
12 13 15 15 17 19 20 20 Mean StDev StErr 15 15 16 17 17 21 21 23 23 23 25 26 27 30 30 31
35 30 25 20 15 10 5 0
36
37
ANOVA - Results
Multiple Comparisons
1.
3.
Test of Homogeneity of Variances VAR00001 Levene Statistic .001 df1 2 df2 21 Sig. .999
Dependent Variable: VAR00001 Tukey HSD Mean Difference (I-J) -1.75000 -10.50000* 1.75000 -8.75000* 10.50000* 8.75000*
2.
VAR00001 Sum of Squares 506.333 205.625 711.958
ANOVA
df 2 21 23
F 25.855
Sig. .000
38
ANOVA Example 2
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Mean StDev 322.000 322.005 322.022 321.991 322.011 321.995 322.006 321.976 321.998 321.996 321.984 321.984 322.004 322.000 322.003 322.002 321.999 0.011 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 322.007 322.031 322.011 322.029 322.009 322.026 322.018 322.007 322.018 321.986 322.018 322.018 322.020 322.012 322.014 322.005 322.014 0.011 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 321.986 321.990 322.002 321.984 322.017 321.983 322.002 322.001 322.004 322.016 322.002 322.009 321.990 321.993 321.991 322.002 321.998 0.010 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 321.994 322.003 322.006 322.003 321.986 322.002 321.998 321.991 321.996 321.999 321.990 322.002 321.986 321.991 321.983 321.998 321.995 0.007
SPSS example Machine type vs. brake diameter 4 group ANOVA Null Hypothesis There is no difference among machines in brake diameter.
39
ANOVA Example 2
40
2.
df 3 60 63
F 11.748
Sig. .000
1.
Test of Homogeneity of Variances Disc Brake Diameter (mm) Levene Statistic .697 df1 3 df2 60 Sig. .557
Multiple Comparisons Dependent Variable: Disc Brake Diameter (mm) Tukey HSD Mean Difference (I-J) -.0157487* .0157487* .0159803* .0188277* -.0159803* -.0188277*
3 4
41
42
43
Chi-Square Test
used with categorical data two variables and two groups on both variables results indicate whether the variables are related
Figure from Geigy Scientific
Tables vol. 2
44
Chi-Square Test
Assumptions: - observations are independent - categories do not overlap - most expected counts > 5 and none < 1 Sensitive to the number of observations Spurious significant results can occur for large n.
45
46
Chi-Square Example
A 1991 U.S. general survey of 225 people asked whether they thought their most important problem in the last 12 months was health or finances. Null hypothesis Males and females will respond the same to the survey.
1 - males 2 - females 2 2 2 2 2 1 2 2
1 - health 2 - finances 1 1 1 1 2 2 2 2
47
Chi-Square Example
problem * group Crosstabulation
Cross-tabulation table shows how many people are in each category. The nonsignificant result signifies that the null hypothesis is accepted.
Females 57 77 134
Pearson Chi-Square a Continuity Correction Likelihood Ratio Fisher's Exact Test Linear-by-Linear Association N of Valid Cases
.582
.319
a. Computed only for a 2x2 table b. 0 cells (.0%) have expected count less than 5. The minimum expected count is 37. 21.
48
Chi-Square Example II
Chi-square test can be extended to multiple responses for two groups.
problem * group Crosstabulation Count group Males Females problem Health 35 57 Finance 56 77 Family 15 33 Personal 9 10 Miscellaneou 15 25 Total 130 202 Total 92 133 48 19 40 332
Chi-Square Tests Value 2.377a 2.400 .021 332 Asymp. Sig. (2-sided) df 4 .667 4 .663 1 .885
a. 0 cells (.0%) have expected count less than 5. T minimum expected count is 7.44.
49
50
Multiple Regression
Null Hypothesis GPA at the end of Freshman year cannot be predicted by performance on college entrance exams. GPA = * ACT score + * Read. Comp. score
51
Multiple Regression
3 2.5 2 y = 0.0382x + 1.56 R2 = 0.0981 3 2.5
2.8
2.6 2.4
GPA
GPA
1.5 1 0.5 0 5 10 15 20
GPA
1.5 1 0.5 0 5 10 15 20
. ad Re
16
p. m Co
12
ACT Score
10
6 4
re co S CT A
8
e or Sc
52
Multiple Regression
ANOVAb
The analysis shows no significant relationship between college entrance tests and GPA.
Model 1
df 2 12 14
F 1.699
Sig. .224a
a. Predictors: (Constant), Reading Comprehension score, ACT score b. Dependent Variable: Grade Point Average in first year of college
Coefficientsa Unstandardized Coefficients B Std. Error 1.172 .460 .018 .034 .042 .031 Standardized Coefficients Beta
Model 1
= .151 = .386
53
MANOVA
Multivariate ANalysis of VAriance (MANOVA) MANOVA allows you to look at differences between variables as well as group differences. assumptions are the same as ANOVA additional condition of multivariate normality also assumes equal covariance matrices (standard deviations between variables should be similar).
54
MANOVA Example
Subset of plantar fasciitis dataset. Null Hypothesis There is no difference in soleus emg activity, peak torque, or time to peak torque for quick stretch measurements in people with plantar fasciitis who receive counterstrain treatment compared with the same group of people receiving a placebo treatment.
Treatment Group 1 - Treated 2 - Control 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 Treated Mean StDev Control Mean StDev Peak to Peak Soleus EMG Response millivolt*2 0.0706 0.0189 0.0062 0.0396 0.0668 0.2524 0.0183 0.0393 0.1319 0.3781 0.039 0.074 0.0396 0.0143 0.076 0.2213 0.0196 0.0498 0.155 0.2887 Peak Quick Time to Stretch Peak Torque Torque milliseconds newton-meter 0.883 0.347 0.388 1.104 3.167 2.628 0.346 1.535 3.282 3.622 0.557 0.525 1.400 0.183 3.074 3.073 0.271 2.278 3.556 3.106 322.56 329.28 319.2 325.92 315.84 248.64 336 332.64 292.32 278.88 299.04 372.96 362.88 295.68 322.56 258.72 346.08 302.4 309.12 292.32 310.128 28.15859087 316.176 35.22039409
55
MANOVA Example
Plantar Study Soleus Peak to Peak EMG
0.25 0.2
Treated Control
Treated Control
0.15 0.1
T o rq u e (n m )
E M G (m v* 2 )
2.5 2
1.5 1
0.05 0
0.5 0
T im e ( m s )
-0.05
56
MANOVA Results
a Box's Test of Equality of Covariance Matrices
Tests the null hypothesis that the observed covariance matrices of the dependent variables are equal across groups. a. Design: Intercept+group
Boxs test checks for equal covariance matrices. A non-significant result means the assumption holds true. Levenes tests checks for univariate normality. A non-significant result means the assumption holds true.
F Soleus peak to peak emg (mv*2) Peak quick stretch torque (nm) Time to peak torque (milliseconds) .349 .078 .550
df1 1 1 1
df2 18 18 18
Tests the null hypothesis that the error variance of the dependent v is equal across groups. a. Design: Intercept+group
57
MANOVA Results
Multivariate Tests b Effect Intercept Pillai's Trace Wilks' Lambda Hotelling's Trace Roy's Largest Root Pillai's Trace Wilks' Lambda Hotelling's Trace Roy's Largest Root Value .996 .004 226.499 226.499 .020 .980 .020 .020 F 1207.992a 1207.992a 1207.992a 1207.992a .109a .109a .109a .109a Hypothesis df 3.000 3.000 3.000 3.000 3.000 3.000 3.000 3.000 Error df 16.000 16.000 16.000 16.000 16.000 16.000 16.000 16.000 Sig. .000 .000 .000 .000 .954 .954 .954 .954
group
The non-significant group result indicates that the null hypothesis is true. If the result had been significant, you would need to do post hoc tests to find out which variables were significant.
58
References
Bruning, James L. and Kintz, B.L.Computational handbook of statistics (4th Edition). Massachusetts: Addison Wesley Longman, Inc., 1997.
59