Вы находитесь на странице: 1из 30

Statistical Analysis of Difference

The Basics

Introduction
When Statistical Tests are used?

When researchers want to determine whether a statistically significant difference exists b/w two or more set of numbers The decision to reject or accept the null hypothesis is based on whether or not the observed values are in the critical region. What we will try to learn from next few classes? - Data Handling - Use of specific statistical tests

Distribution for Analysis of Difference


What type of distribution you have learned so far?
- Standard Normal Distributions - Z-scores - When we use these distributions we assume that

population standard deviation is known. - Because population standard deviations is usually not known we cannot ordinarily use the standard normal distribution and its z- scores to draw statistical conclusions from samples.

Distribution for Analysis of Difference


Then What should we do???? Then researchers conduct most statistical tests using distributions that resemble the normal distributions but are altered somewhat to account for the errors the errors that are made when population parameters are not known.

The Three most common distributions used are t, F and chi-square distributions

How we use these distributions?


Just like z-score distribution We determine the probability of certain z- scores

based on standard normal distribution


We can determine the probability of obtaining certain

t, F and chi-square statistics based on their respective distribution.


The decision to reject or accept the null hypothesis is

based on whether or not the observed values are in the critical region.

What are the shapes of these distributions?

What things influence the shapes of these distribution?


Degrees of Freedom The degrees of freedom are calculated in different

ways for the different distributions but in general are related to two things. Number of participants in study 2. Number of levels of independent variable
1.

t- distributions

The picture shows the shape of the t-distribution in comparison to the standard normal (or Z) distribution. Notice that the t-distribution becomes flatter with a smaller value of n.

T-distribution
Some characteristics of t-distribution also known as student t distribution The mean of the distribution is equal to 0 . 2. The variance is equal to v / ( v - 2 ), where v is the degrees of freedom and v> 2. 3. The S.D. is always greater than 1
1.

F-distribution

The shape of the F distribution is dependent upon the degrees of freedom of both the numerator and denominator. Red has df1=2 and df2=3, blue has df1= 4 and df2=30, and black has df1= 20 and df2=20.

F-distribution
Characteristics of the F-distribution

1. It is not symmetric. The F-distribution is skewed right. That is, it is positively skewed. 2. The shape of the F-distribution depends upon the degrees of freedom in the numerator and denominator. 3. The total area under the curve is 1. 4. The values of F are always greater than or equal to zero. That is F distribution can not be negative. The F distribution is used to test whether two population variances are the same.

Chi-square distribution

Notice that in this picture as df gets large, curve is less skewed, more normal.

Properties of Chi-square distribution


Chi-square is non-negative. Is the ratio of two

non-negative values, therefore must be nonnegative itself. Chi-square is non-symmetric or asymmetric. There are many different chi-square distributions, one for each degree of freedom. The degrees of freedom when working with a single population variance is n-1.

Let compare and review three distributions


t- distribution A symmetric distribution F- distribution Non-symmetric distribution. Why asymmetric because obtained from squared scores of t-statistic Chi-square distribution Non-symmetric as the df increases it becomes more symmetric Obtained by distribution of squared z-scores

Shape of t-distribution varies with degree of freedom which is base on sample size In case of large sample size the tdistribution becomes more like z-distribution because df and sample size are large Consist of both positive and negative values

Shape of f-distribution Shape of distribution varies depends on two degrees of with its degree of freedom. freedom called numerator and denominator First one is associated with number of groups being compared Second is associated with sample size Consist of only positive The value of chi-square is values. Therefore positively never negative. Therefore skewed it is positively skewed

Types of Test
Parametric
Use sample statistics such as the mean, standard deviation and variance to estimate differences b/w population parameters
Major classes of parametric tests are t-test and Analysis of variance Pearson product moment correlation Based on specific assumptions More powerful and preferred, however cannot always be used because assumptions on which

Non-parametric
Use rank and frequency distributions to draw conclusions about the distribution of population parameters
Chi-square, Spearman rank order rho,

Used when those assumptions not meet Considered less powerful however used as many times in actual researches the

Assumptions of Tests of Difference


Assumptions for Parametric Tests are - Random selection - Homogeneity of variance - Level of Measurement (Controversial)

Assumption 1: Random Selection from a normally distributed population


Participants are randomly selected from normally distributed populations Even if data sets are relatively normally distributed still accepted The extent to which data set is normally distributed can be tested (We will practice this today on SPSS) In case when data set is not normally distributed there is one strategy called transform or convert data and then use parametric tests on that data Otherwise can also use non-parametric tests

Assumption 2: Homogeneity of Variance


Population variance of groups being tested are equal

of homogenous
This can also be tested statistically Will practice how to compute in next class

What to do after checking homogeneity of variance? In case if the variances of the groups are found to differ significantly, non-parametric tests must be used In case if the sample sizes of groups being compared are same, differences in the variances of groups become less concern Researchers often design their studies to have equal sample size in two groups

Assumption 3: Level of Measurement


Do you know what are levels of measurements? In the previous slide where we compared parametric

and non-parametric test have you noted which type of sample statistics we use for parametric and non parametric tests Parametric Non-parametric
Use sample statistics such as the mean, standard deviation and variance to estimate differences b/w population parameters Interval and ratio data meet this need The controversy is about use of parametric tests with ordinal measurements which not remain valid so much Use rank and frequency distributions to draw conclusions about the distribution of population parameters Nominal and ranked ordinal data meet this need Interval and ratio data can be converted into ranks or grouped into categories to meet this need

Assumption 3: Level of Measurement


Note: Regardless of the origin of numbers, parametric tests can be conducted as the data itself meet the assumptions of parametric tests However the researcher must interpret the parametric statistical conclusions based on ordinal data in light of their clinical and practical implications. Can be illustrated with example

Assumption 3: Level of Measurement


Example from Rehabilitation Research Variable : Amount of assistance a patient needs to accomplish various functional tasks. Categories are :
Cod es
1 2 3 4 5

Categories
Maximal Moderate Minimal Standby No Assistance

Mean Scores of four group


1.0 2.0 3.0 4.0 5.0

These group means have found to be significantly different from one another.
If the researcher believe that the real interval b/w maximal and moderate is greater than the interval b/w stand-by and no assistance they might interpret the differences b/w 1.0 and 2.0 as clinically imp then b/w stand by

Checking Normality of Data


If we learn it by example, our hypothesis in this example is the null

Steps

hypothesis (Ho) is that the data is normally distributed and the alternative hypothesis (Ha) is that the data is not normally distributed.
Actions Select "Analyze -> Descriptive Statistics -> Explore". From the list on the left, select the variable Age" to the "Dependent List". Click "Plots" on the right. A new window will come. Check "None" for boxplot, uncheck everything for descriptive and make sure the box "Normality plots with tests" is checked. The results now appear in the "Output" window. Interpret the result. Look into the third table. Here two tests for normality are run. For dataset small than 2000 elements, we use the Shapiro-Wilk test, otherwise, the Kolmogorov-Smirnov test is used. If the Sig. value of the Shapiro-Wilk Test is greater than 0.05, the data is normal. If it is below 0.05, the data significantly deviate from a normal distribution.

Step 1 Step 2

Step 3 Step 4

Graphical Method
Normal quantile-quantile plot (Q-Q

plot) is the most commonly used and effective diagnostic tool for checking normality of the data.
It is constructed by plotting the

empirical quantiles of the data against corresponding quantiles of the normal distribution.
If the empirical distribution of the data

is approximately normal, the quantiles of the data will closely match the normal quantiles, and the points on the plot will fall near the line y=x.

Graphical Method
It is impossible to fit a straight line in Q-Q plot for the real data due to the fact

that the random fluctuations will cause the points to drift away and aberrant observations often contaminate the samples.
Only large or systematic departures from the line indicate the abnormality of the

data. The points will remain reasonably close to the line if there is just natural variability.
Therefore, the straightness of the normal Q-Q plot helps us to judge whether

the data has the same distribution shape as a normal distribution, while shifts and tilts away from the line y=x indicate differences in location and spread, respectively

Graphical Method (Q-Q plot Interpretation points)


If the data are normally distributed, the data points

will be close to the diagonal line. If the data points stray from the line in an obvious non-linear fashion, the data are not normally distributed.
If you are at all unsure of being able to correctly

interpret the graph, rely on the numerical methods instead because it can take a fair bit of experience to correctly judge the normality of data based on plots.

Normality Check Other way


Histogram: When a histograms shape

approximates a bell-curve it suggests that the data may have come for a normal population.

Example from Data Set


In both plots, there is a single value that appears to be considerably different. It is an outlier. This happens to be observation number 5 in the data set.

If we readjust outlier

Analysis of Skewness and Kurtosis


Since the skewness and kurtosis of the normal

distribution are zero, values for these two parameters should be close to zero for data to follow a normal distribution. A rough measure of the standard error of the skewness is 6/n where n is the sample size. A rough measure of the standard error of the kurtosis is 24/n where n is the sample size. If the absolute value of the skewness for the data is more than twice the standard error this indicates that the data are not symmetric, and therefore not normal. Similarly if the absolute value of the kurtosis for the data is more than twice the standard error this is also an indication that the data are not normal.

Example
If in a data set the skewness for the data is

(Absolute value .23 ) and the kurtosis is (Absolute value -1,53). The standard error for the skewness is .55 the standard error for the kurtosis is 1.10. Both values are nearly not the twice the standard error As in previous slide If the absolute value of the skewness/kutosis for the data is more than twice the standard error this indicates that the data are not symmetric, and therefore not normal. Both statistics are within two standard errors, which suggest that the data is likely to be

Вам также может понравиться