Академический Документы
Профессиональный Документы
Культура Документы
The Basics
Introduction
When Statistical Tests are used?
When researchers want to determine whether a statistically significant difference exists b/w two or more set of numbers The decision to reject or accept the null hypothesis is based on whether or not the observed values are in the critical region. What we will try to learn from next few classes? - Data Handling - Use of specific statistical tests
population standard deviation is known. - Because population standard deviations is usually not known we cannot ordinarily use the standard normal distribution and its z- scores to draw statistical conclusions from samples.
The Three most common distributions used are t, F and chi-square distributions
based on whether or not the observed values are in the critical region.
ways for the different distributions but in general are related to two things. Number of participants in study 2. Number of levels of independent variable
1.
t- distributions
The picture shows the shape of the t-distribution in comparison to the standard normal (or Z) distribution. Notice that the t-distribution becomes flatter with a smaller value of n.
T-distribution
Some characteristics of t-distribution also known as student t distribution The mean of the distribution is equal to 0 . 2. The variance is equal to v / ( v - 2 ), where v is the degrees of freedom and v> 2. 3. The S.D. is always greater than 1
1.
F-distribution
The shape of the F distribution is dependent upon the degrees of freedom of both the numerator and denominator. Red has df1=2 and df2=3, blue has df1= 4 and df2=30, and black has df1= 20 and df2=20.
F-distribution
Characteristics of the F-distribution
1. It is not symmetric. The F-distribution is skewed right. That is, it is positively skewed. 2. The shape of the F-distribution depends upon the degrees of freedom in the numerator and denominator. 3. The total area under the curve is 1. 4. The values of F are always greater than or equal to zero. That is F distribution can not be negative. The F distribution is used to test whether two population variances are the same.
Chi-square distribution
Notice that in this picture as df gets large, curve is less skewed, more normal.
non-negative values, therefore must be nonnegative itself. Chi-square is non-symmetric or asymmetric. There are many different chi-square distributions, one for each degree of freedom. The degrees of freedom when working with a single population variance is n-1.
Shape of t-distribution varies with degree of freedom which is base on sample size In case of large sample size the tdistribution becomes more like z-distribution because df and sample size are large Consist of both positive and negative values
Shape of f-distribution Shape of distribution varies depends on two degrees of with its degree of freedom. freedom called numerator and denominator First one is associated with number of groups being compared Second is associated with sample size Consist of only positive The value of chi-square is values. Therefore positively never negative. Therefore skewed it is positively skewed
Types of Test
Parametric
Use sample statistics such as the mean, standard deviation and variance to estimate differences b/w population parameters
Major classes of parametric tests are t-test and Analysis of variance Pearson product moment correlation Based on specific assumptions More powerful and preferred, however cannot always be used because assumptions on which
Non-parametric
Use rank and frequency distributions to draw conclusions about the distribution of population parameters
Chi-square, Spearman rank order rho,
Used when those assumptions not meet Considered less powerful however used as many times in actual researches the
of homogenous
This can also be tested statistically Will practice how to compute in next class
What to do after checking homogeneity of variance? In case if the variances of the groups are found to differ significantly, non-parametric tests must be used In case if the sample sizes of groups being compared are same, differences in the variances of groups become less concern Researchers often design their studies to have equal sample size in two groups
and non-parametric test have you noted which type of sample statistics we use for parametric and non parametric tests Parametric Non-parametric
Use sample statistics such as the mean, standard deviation and variance to estimate differences b/w population parameters Interval and ratio data meet this need The controversy is about use of parametric tests with ordinal measurements which not remain valid so much Use rank and frequency distributions to draw conclusions about the distribution of population parameters Nominal and ranked ordinal data meet this need Interval and ratio data can be converted into ranks or grouped into categories to meet this need
Categories
Maximal Moderate Minimal Standby No Assistance
These group means have found to be significantly different from one another.
If the researcher believe that the real interval b/w maximal and moderate is greater than the interval b/w stand-by and no assistance they might interpret the differences b/w 1.0 and 2.0 as clinically imp then b/w stand by
Steps
hypothesis (Ho) is that the data is normally distributed and the alternative hypothesis (Ha) is that the data is not normally distributed.
Actions Select "Analyze -> Descriptive Statistics -> Explore". From the list on the left, select the variable Age" to the "Dependent List". Click "Plots" on the right. A new window will come. Check "None" for boxplot, uncheck everything for descriptive and make sure the box "Normality plots with tests" is checked. The results now appear in the "Output" window. Interpret the result. Look into the third table. Here two tests for normality are run. For dataset small than 2000 elements, we use the Shapiro-Wilk test, otherwise, the Kolmogorov-Smirnov test is used. If the Sig. value of the Shapiro-Wilk Test is greater than 0.05, the data is normal. If it is below 0.05, the data significantly deviate from a normal distribution.
Step 1 Step 2
Step 3 Step 4
Graphical Method
Normal quantile-quantile plot (Q-Q
plot) is the most commonly used and effective diagnostic tool for checking normality of the data.
It is constructed by plotting the
empirical quantiles of the data against corresponding quantiles of the normal distribution.
If the empirical distribution of the data
is approximately normal, the quantiles of the data will closely match the normal quantiles, and the points on the plot will fall near the line y=x.
Graphical Method
It is impossible to fit a straight line in Q-Q plot for the real data due to the fact
that the random fluctuations will cause the points to drift away and aberrant observations often contaminate the samples.
Only large or systematic departures from the line indicate the abnormality of the
data. The points will remain reasonably close to the line if there is just natural variability.
Therefore, the straightness of the normal Q-Q plot helps us to judge whether
the data has the same distribution shape as a normal distribution, while shifts and tilts away from the line y=x indicate differences in location and spread, respectively
will be close to the diagonal line. If the data points stray from the line in an obvious non-linear fashion, the data are not normally distributed.
If you are at all unsure of being able to correctly
interpret the graph, rely on the numerical methods instead because it can take a fair bit of experience to correctly judge the normality of data based on plots.
approximates a bell-curve it suggests that the data may have come for a normal population.
If we readjust outlier
distribution are zero, values for these two parameters should be close to zero for data to follow a normal distribution. A rough measure of the standard error of the skewness is 6/n where n is the sample size. A rough measure of the standard error of the kurtosis is 24/n where n is the sample size. If the absolute value of the skewness for the data is more than twice the standard error this indicates that the data are not symmetric, and therefore not normal. Similarly if the absolute value of the kurtosis for the data is more than twice the standard error this is also an indication that the data are not normal.
Example
If in a data set the skewness for the data is
(Absolute value .23 ) and the kurtosis is (Absolute value -1,53). The standard error for the skewness is .55 the standard error for the kurtosis is 1.10. Both values are nearly not the twice the standard error As in previous slide If the absolute value of the skewness/kutosis for the data is more than twice the standard error this indicates that the data are not symmetric, and therefore not normal. Both statistics are within two standard errors, which suggest that the data is likely to be