Statistical Reasoning III

Statistical Analysis of Difference
The Basics
Introduction
When Statistical Tests are used?
When researchers want to determine whether a statistically significant difference exists b/w two or more set of numbers The decision to reject or accept the null hypothesis is based on whether or not the observed values are in the critical region. What we will try to learn from next few classes? - Data Handling - Use of specific statistical tests
Distribution for Analysis of Difference

What type of distribution you have learned so far?
- Standard Normal Distributions - Z-scores - When we use these distributions we assume that
population standard deviation is known. - Because population standard deviations is usually not known we cannot ordinarily use the standard normal distribution and its z- scores to draw statistical conclusions from samples.
Distribution for Analysis of Difference

Then What should we do???? Then researchers conduct most statistical tests using distributions that resemble the normal distributions but are altered somewhat to account for the errors the errors that are made when population parameters are not known.
The Three most common distributions used are t, F and chi-square distributions
How we use these distributions?

Just like z-score distribution We determine the probability of certain z- scores
based on standard normal distribution

We can determine the probability of obtaining certain
t, F and chi-square statistics based on their respective distribution.

The decision to reject or accept the null hypothesis is
based on whether or not the observed values are in the critical region.
What are the shapes of these distributions?
What things influence the shapes of these distribution?

Degrees of Freedom The degrees of freedom are calculated in different
ways for the different distributions but in general are related to two things. Number of participants in study 2. Number of levels of independent variable
1.
t- distributions
The picture shows the shape of the t-distribution in comparison to the standard normal (or Z) distribution. Notice that the t-distribution becomes flatter with a smaller value of n.
T-distribution
Some characteristics of t-distribution also known as student t distribution The mean of the distribution is equal to 0 . 2. The variance is equal to v / ( v - 2 ), where v is the degrees of freedom and v> 2. 3. The S.D. is always greater than 1
1.
F-distribution
The shape of the F distribution is dependent upon the degrees of freedom of both the numerator and denominator. Red has df1=2 and df2=3, blue has df1= 4 and df2=30, and black has df1= 20 and df2=20.
F-distribution
Characteristics of the F-distribution
1. It is not symmetric. The F-distribution is skewed right. That is, it is positively skewed. 2. The shape of the F-distribution depends upon the degrees of freedom in the numerator and denominator. 3. The total area under the curve is 1. 4. The values of F are always greater than or equal to zero. That is F distribution can not be negative. The F distribution is used to test whether two population variances are the same.
Chi-square distribution
Notice that in this picture as df gets large, curve is less skewed, more normal.
Properties of Chi-square distribution

Chi-square is non-negative. Is the ratio of two
non-negative values, therefore must be nonnegative itself. Chi-square is non-symmetric or asymmetric. There are many different chi-square distributions, one for each degree of freedom. The degrees of freedom when working with a single population variance is n-1.
Let compare and review three distributions

t- distribution A symmetric distribution F- distribution Non-symmetric distribution. Why asymmetric because obtained from squared scores of t-statistic Chi-square distribution Non-symmetric as the df increases it becomes more symmetric Obtained by distribution of squared z-scores
Shape of t-distribution varies with degree of freedom which is base on sample size In case of large sample size the tdistribution becomes more like z-distribution because df and sample size are large Consist of both positive and negative values
Shape of f-distribution Shape of distribution varies depends on two degrees of with its degree of freedom. freedom called numerator and denominator First one is associated with number of groups being compared Second is associated with sample size Consist of only positive The value of chi-square is values. Therefore positively never negative. Therefore skewed it is positively skewed
Types of Test
Parametric
Use sample statistics such as the mean, standard deviation and variance to estimate differences b/w population parameters
Major classes of parametric tests are t-test and Analysis of variance Pearson product moment correlation Based on specific assumptions More powerful and preferred, however cannot always be used because assumptions on which
Non-parametric
Use rank and frequency distributions to draw conclusions about the distribution of population parameters
Chi-square, Spearman rank order rho,
Used when those assumptions not meet Considered less powerful however used as many times in actual researches the
Assumptions of Tests of Difference

Assumptions for Parametric Tests are - Random selection - Homogeneity of variance - Level of Measurement (Controversial)
Assumption 1: Random Selection from a normally distributed population

Participants are randomly selected from normally distributed populations Even if data sets are relatively normally distributed still accepted The extent to which data set is normally distributed can be tested (We will practice this today on SPSS) In case when data set is not normally distributed there is one strategy called transform or convert data and then use parametric tests on that data Otherwise can also use non-parametric tests
Assumption 2: Homogeneity of Variance

Population variance of groups being tested are equal
of homogenous
This can also be tested statistically Will practice how to compute in next class
What to do after checking homogeneity of variance? In case if the variances of the groups are found to differ significantly, non-parametric tests must be used In case if the sample sizes of groups being compared are same, differences in the variances of groups become less concern Researchers often design their studies to have equal sample size in two groups
Assumption 3: Level of Measurement

Do you know what are levels of measurements? In the previous slide where we compared parametric
and non-parametric test have you noted which type of sample statistics we use for parametric and non parametric tests Parametric Non-parametric
Use sample statistics such as the mean, standard deviation and variance to estimate differences b/w population parameters Interval and ratio data meet this need The controversy is about use of parametric tests with ordinal measurements which not remain valid so much Use rank and frequency distributions to draw conclusions about the distribution of population parameters Nominal and ranked ordinal data meet this need Interval and ratio data can be converted into ranks or grouped into categories to meet this need

Note: Regardless of the origin of numbers, parametric tests can be conducted as the data itself meet the assumptions of parametric tests However the researcher must interpret the parametric statistical conclusions based on ordinal data in light of their clinical and practical implications. Can be illustrated with example

Example from Rehabilitation Research Variable : Amount of assistance a patient needs to accomplish various functional tasks. Categories are :
Cod es
1 2 3 4 5
Categories
Maximal Moderate Minimal Standby No Assistance
Mean Scores of four group

1.0 2.0 3.0 4.0 5.0
These group means have found to be significantly different from one another.
If the researcher believe that the real interval b/w maximal and moderate is greater than the interval b/w stand-by and no assistance they might interpret the differences b/w 1.0 and 2.0 as clinically imp then b/w stand by
Checking Normality of Data

If we learn it by example, our hypothesis in this example is the null
Steps
hypothesis (Ho) is that the data is normally distributed and the alternative hypothesis (Ha) is that the data is not normally distributed.
Actions Select "Analyze -> Descriptive Statistics -> Explore". From the list on the left, select the variable Age" to the "Dependent List". Click "Plots" on the right. A new window will come. Check "None" for boxplot, uncheck everything for descriptive and make sure the box "Normality plots with tests" is checked. The results now appear in the "Output" window. Interpret the result. Look into the third table. Here two tests for normality are run. For dataset small than 2000 elements, we use the Shapiro-Wilk test, otherwise, the Kolmogorov-Smirnov test is used. If the Sig. value of the Shapiro-Wilk Test is greater than 0.05, the data is normal. If it is below 0.05, the data significantly deviate from a normal distribution.
Step 1 Step 2
Step 3 Step 4
Graphical Method
Normal quantile-quantile plot (Q-Q
plot) is the most commonly used and effective diagnostic tool for checking normality of the data.
It is constructed by plotting the
empirical quantiles of the data against corresponding quantiles of the normal distribution.
If the empirical distribution of the data
is approximately normal, the quantiles of the data will closely match the normal quantiles, and the points on the plot will fall near the line y=x.
Graphical Method
It is impossible to fit a straight line in Q-Q plot for the real data due to the fact
that the random fluctuations will cause the points to drift away and aberrant observations often contaminate the samples.
Only large or systematic departures from the line indicate the abnormality of the
data. The points will remain reasonably close to the line if there is just natural variability.
Therefore, the straightness of the normal Q-Q plot helps us to judge whether
the data has the same distribution shape as a normal distribution, while shifts and tilts away from the line y=x indicate differences in location and spread, respectively
Graphical Method (Q-Q plot Interpretation points)

If the data are normally distributed, the data points
will be close to the diagonal line. If the data points stray from the line in an obvious non-linear fashion, the data are not normally distributed.
If you are at all unsure of being able to correctly
interpret the graph, rely on the numerical methods instead because it can take a fair bit of experience to correctly judge the normality of data based on plots.
Normality Check Other way

Histogram: When a histograms shape
approximates a bell-curve it suggests that the data may have come for a normal population.
Example from Data Set

In both plots, there is a single value that appears to be considerably different. It is an outlier. This happens to be observation number 5 in the data set.
If we readjust outlier
Analysis of Skewness and Kurtosis

Since the skewness and kurtosis of the normal
distribution are zero, values for these two parameters should be close to zero for data to follow a normal distribution. A rough measure of the standard error of the skewness is 6/n where n is the sample size. A rough measure of the standard error of the kurtosis is 24/n where n is the sample size. If the absolute value of the skewness for the data is more than twice the standard error this indicates that the data are not symmetric, and therefore not normal. Similarly if the absolute value of the kurtosis for the data is more than twice the standard error this is also an indication that the data are not normal.
Example
If in a data set the skewness for the data is
(Absolute value .23 ) and the kurtosis is (Absolute value -1,53). The standard error for the skewness is .55 the standard error for the kurtosis is 1.10. Both values are nearly not the twice the standard error As in previous slide If the absolute value of the skewness/kutosis for the data is more than twice the standard error this indicates that the data are not symmetric, and therefore not normal. Both statistics are within two standard errors, which suggest that the data is likely to be

Statistical Reasoning III

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Statistical Reasoning III

Загружено:

Авторское право:

Доступные форматы

Statistical Analysis of Difference

Distribution for Analysis of Difference

Distribution for Analysis of Difference

How we use these distributions?

based on standard normal distribution

t, F and chi-square statistics based on their respective distribution.

What are the shapes of these distributions?

What things influence the shapes of these distribution?

Properties of Chi-square distribution

Let compare and review three distributions

Assumptions of Tests of Difference

Assumption 1: Random Selection from a normally distributed population

Assumption 2: Homogeneity of Variance

Assumption 3: Level of Measurement

Assumption 3: Level of Measurement

Assumption 3: Level of Measurement

Mean Scores of four group

Checking Normality of Data

Graphical Method (Q-Q plot Interpretation points)

Normality Check Other way

Example from Data Set

Analysis of Skewness and Kurtosis

Вам также может понравиться