Statistical Analysis

Dr.
Manjunath K R, IMSR
STATISTICAL ANALYSIS OF DATA
7/18/2013
What a researcher should know?
Contents
2
Introduction to Statistical Analysis Types of Statistical Measures

Measures of central tendency
Measures of Dispersion
Measures of Association / Relations
Analysis of Variance
Dr. Manjunath K R, IMSR
7/18/2013
Analysis Of Data
3
Critical examination of the assembled data
For studying the characteristics of the object under study For determining the pattern of relationships among the variables
7/18/2013
Purpose of Statistical Analysis

4
Summarizes large mass of data into understandable form
Exact descriptions made possible
Identifies the causal factors

Helps to draw inferences and lead to make predictions
Testing the hypothesis
7/18/2013
Aspects focused in Statistical Analysis

5
Descriptive Analysis Comparison
Relationships among variables

Factor analysis
7/18/2013
Descriptive Analysis
6
It involves distribution of respondents Measures of central tendency are used to describe characteristics
7/18/2013
Comparison
7
Comparison of two or more distributions to know the relative wideness of spread Measures of dispersion such as range, standard deviation, coefficient of variation, ratios, proportions, percentages are made use of to compare
7/18/2013
Nature of relationship among variables

8
Relationship between dependent and independent variables Relationship among various variables Correlation, regression analysis are used
7/18/2013
Contents
9

Measures Measures Measures
of central tendency of Dispersion of Association / Relations
Analysis
of Variance
Dr. Manjunath K R, IMSR 7/18/2013
Types of Statistical Measures

10
Measures of central tendency Measures of Dispersion Measures of Association / Relations Analysis of Variance
Time series
Contents
11

Analysis
of Variance
Measures of Central Tendency

12
Mean, Median and Mode are known as descriptive measures
They give concise description of a group as a whole
They facilitate comparison between groups
7/18/2013
Measures of Central Tendency Contd..

13
If the researcher wants a crude descriptive measure or a common or a typical category- mode is used
If an average figure is required for further mathematical
manipulation, arithmetic mean is preferable
To describe a skewed distribution median is preferred
7/18/2013
Contents
14

Analysis
of Variance
Measures of Dispersion
15
Range, Standard deviation, Coefficient of Variation are the measures of dispersion Range is a crude measure of dispersion Mean deviation considers all values in distribution but it does not give much information about distribution Standard deviation is a measure, computed basing on square root of the mean of squared deviations from Arithmetic mean. It is amenable of further mathematical manipulation.
Measures of Dispersion Contd..

16
Variance is the square of standard deviation. Co efficient of variance is expressed as percentage of standard deviation divided by the arithmetic Mean. It is useful for comparison. CV indicates relative variation. Standard deviation and normal distribution. Mean +/- one std dev and so on.
7/18/2013
Contents
17

Analysis
of Variance
Measures of Association / Relationship

18
Identifying the relationship between two or more variables helps to draw conclusions. In examining relationship, the following questions arise.
Is there relationship between variables under study? What is the direction and degree of relationship? Is the relationship a causal one? Is it statistically significant.
7/18/2013
Measures of Association / Relationship Contd..

19
Cross Tabulation and percentage difference are used to measure the association
Ex Preference of Brand A and age
Brand Preference Youth Yes 120(60%)
Age Adult 120(60%) Total 240
No Total
80(40%) 200
80(40%) 200
160 400
7/18/2013
Preference of Brand B and Age

20
Brand Preference Yes No Total
Age
Youth
140(70%) 60(30%) 200
Adult
20(10%) 180(90%) 200
Total
160 240 400
7/18/2013
Correlation
21
Correlation is basically a measure of degree of linear association between two variables. Eg: Income and level of education ; type of seed and yield per acre, years of experience and yearly salary, demand and prices; Income and savings, Height and weight etc..
However it does not necessarily imply that one is the cause of the other. Correlation is an index of positive or negative association or intensity of association between two variables.
Interpretation
22
If coefficient is zero or close to zero, it means that the two variables are not correlated.
Even though the values of two variables are correlated and r is calculated, there is no cause and effect relationship between the forces affecting the distribution of the items.
Eg: Income and Height.

Interpretation Continued
23
When r = +1 there is perfect positive relationship.
When r = -1. there is perfect negative relationship.

When r = 0 . There is no relationship. When r is close to +1 or 1, the closer the relationship. Full interpretation of r depends upon circumstances of individual cases. The closeness of the relationship is not proportional to r.
7/18/2013
Coefficient of Correlation and Probable Error

24
Probable error of r helps to interpret the reliability of the value of coefficient of correlation.
The formula is (0.6745) (1-r2) / N

If r is less than six times the probable error, the correlation is practically certain and it is significant.
Co-efficient of Determination
25
It is square of r.
It indicates to what extent the variation in the independent variable affects the dependent variable.
If r = 0.9, then r2= 0.81. It means that 81% of variation in the dependent variable is due to independent variable. If r is 0.707, r2= 0.5 , Therefore 50% Variation is explained. If r is 0.4 , r2= 0.16 . Therefore 16% Variation is explained.
Eg: Income & savings
7/18/2013
Rank Correlation
26
Variables are measured in qualitative terms instead of quantitative terms.
Eg: Leadership abilities of managers, judgment in beauty contests, psychological studies, singing competition etc.
To measure the strength of linear relationship between a paired set of such variables, Spearmans rank correlation coefficient is used.
The rank correlation coefficient determines the extent to which the two sets of ranking are in agreement or in disagreement.
The formula is rs = 6 d12 / n(n2 - 1) (Where d is the difference between each pair of ranks and n is the number of pairs in the sample)
7/18/2013
Testing Significance of rs
27
For small samples ( 30), if the table value of rs is greater than the calculated rank coefficient of correlation, then, the null hypothesis is accepted which means that there is no significant difference in the rankings. If the calculated value of rs is greater than the table value, the hypothesis is rejected which means that there is significant difference between the rankings of two judges.
Regression Analysis
28
Regression analysis refers to the methods by which estimates are made of the values of a variable from a knowledge of the values of one or more other variables and to the measurement of the errors involved in this estimation process. ------Morris Hamburg. Regression was used by Sir Francis Galton in 1877 while studying the relationship between the height of fathers and sons.
Purpose
29
To estimate or predict the unknown values of one variable known as dependent variable from known values of another variable called independent variable. The estimation is made by means of a regression equation which describes average relationship between 2 variables.( X and Y) The second goal of RA is to obtain a measure of the error involved in using the regression line as a basis for estimation. For this the standard error estimate is calculated.
Regression & Correlation

30
Correlation coefficient indicates the closeness or otherwise of relationship between 2 variables but regression explains nature of relationship which helps to predict the values of dependent variable. Correlation speaks of presence of association and does not explain which is effect and which is cause. ex: price and demand; Marital status and income. Regression on the other hand implies cause and effect relationship between independent and dependent variable and it necessarily implies association between the two variables. There can be non-sense correlation, which is called spurious Correlation. ex: Teachers salaries and consumption of liquor. It does not prove increase in sale of liquor increases teachers salaries. There is nothing like non-sense regression.
7/18/2013
Regression lines & equations

31
Variables may have linear or non-linear relationships. Linear relationship means an equation of a straight line of the form of y= a+bx, where a and b are constants and it describes the average relationship that exists between x and y. There are two regression lines or equations. One is regression line of x on y and the other regression equation of y on x. They intersect at one point and from that point a perpendicular line drawn on x-axis gives the average of x and a horizontal line drawn on y-axis, gives us the average (mean) of y. Regression equation of Y on X is computed by solving the normal equations: Y = Na + bX XY = aX + bx Regression equation of X and Y is computed by solving X = Na + bY XY = aY + bY
7/18/2013
Standard error of estimate

32
Regression equations cannot predict perfectly. Sale of gasoline based on automobile registration cannot be estimated 100% perfect. So a measure which would indicate how precise the prediction of Y is, based on X or vice versa ---- is called standard error of estimate, symbolised as Syx.
7/18/2013
Regression equation from correlation

33
Reg. Equation of X on Y = X- X = ( x/y) (y-y) Reg. Equation of Y on X = Y-Y = ( y/x) (x-x)
Reg. Coefficient of X on Y = bxy = ( x/y)

Reg. Coefficient of Y on X = byx = ( y/x)
7/18/2013
34
Limitations of Regression Analysis
The assumption that the relationship has not changed since the regression equation is computed. The relationship between the variables exists up to a certain limit and beyond that point, the relationship changes. ex: Yield of crop increases if doses of fertilizers is increased, stands good up to a limit only.
7/18/2013
Illustration
35
Obtain 2 regression equations from the following data
XY
X2
Y2
6
2 10 4
9
11 5 8
54
22 50 32
36
4 100 16
81
121 25 64
7
40
56
214
64
220
49
340
30
7/18/2013
Illustration Continued
36
Reg. eq. Of X on Y is: Xc = a+by
Two equations are X = Na+bY 5a+40b = 30 XY = aY+bY2 40a + 340b = 214 Solving them we get a = 16.4 ; b = -1.3 Equation is X = 16.4 1.3Y
Illustration Continued
37
Reg. eq. Of Y on X is: Yc = a+bx

Y = Na + bX 5a+30b = 40 XY = aX+bX2 30a + 220b = 214
Two equations are

Solving them we get a = 11.9 ; b = -0.65 Equation is Y = 11.9 0.65X Note: Reg. Eq. Of X on Y is used to compute values of X, given the values of Y and similarly regression equation of Y on X is used to calculate the values of Y given the values of X
Contents
38

Analysis
of Variance
Need for ANOVA

39
Samples drawn from normal population should have same variance. It is needed to make proper comparison If variance between two samples is more, the results give a distorted picture It enables to analyze the total variation of our data into components which may be attributed to various sources or causes of variation.
7/18/2013
Meaning of Analysis of Variance

40
It is a technique to find whether the samples are drawn from the same population or not. For Ex: To know whether there is any significant effect of application of five fertilisers on four plots on their yields. The purpose of f test is not the significance of difference between two sample variances. Its purpose is to test for the significance of the differences among sample means.
Difference - t test & F test

41
ANOVA is known as F test in honour of R.A. Fisher. It is also known as variance ratio test t test is adequate when we have means of two samples. F test is gainfully applied when three or more samples to consider.
7/18/2013
Assumptions in ANOVA
42
Normality Homogeneity Independence of error If the distributions are bimodal or very skewed, the F test results may not be valid.
7/18/2013
Technique of ANOVA
43
One-way classification
there
is only one criteria one set of hypothesis
Two-way classification
two
independent factors might have an effect on the response variable of interest two sets of hypothesis
7/18/2013
One-way classification
44
Formulate null hypothesis Ho=1=2=3=- - - -k Calculate variance between the samples Calculate variance within the samples Calculate F=(Between-column variance) / (withinColumn variance)
Compare the calculated value of F with the table value for the degrees of freedom at a certain level significance. Accept or reject the hypothesis
7/18/2013
ANOVA TABLE
45
Source of variation Between samples Within Samples
SS
(sum of squares)
V degrees of freedom V1=c 1 V2= n - c
MS mean square
Variance ratio of F
SSC SSE
MSC=SSC/(c 1) MSE=SSE/(n c) MSC/MSE
Total
SST
N1
SST=Total sum of squares of variations SSC=Sum of squares between samples(Columns) SSE= Sum of squares within samples(Rows) MSC=Mean sum of squares between samples MSE= Mean sum of squares within samples
7/18/2013
46
Example for one way classification
The results of four schools for a subject common to all are given below. Make an analysis of variance of the data.
B school 12 11 9 14 4 C school D school 18 12 16 6 8 13 9 12 16 15 8 10 12 8 7
A School
7/18/2013
Rationale of the test

47
The variation within samples measures the influence of the chance forces which cause the individual observations to vary from one another. The variation between the samples reflects not only the influence of chance forces but also the effect of other forces, which cause the various sample means to differ from one another. If the hypothesis is not true, it means that the variation between the samples will tend to be larger than that of within samples
48
ANOVA of two way classification
When two independent factors might have an effect on the variable of interest, two way classification technique is adopted.
7/18/2013

Statistical Analysis

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Statistical Analysis

Загружено:

Авторское право:

Доступные форматы

Dr.

STATISTICAL ANALYSIS OF DATA

What a researcher should know?

Introduction to Statistical Analysis Types of Statistical Measures

Measures of central tendency

Dr. Manjunath K R, IMSR

Critical examination of the assembled data

Dr. Manjunath K R, IMSR

Purpose of Statistical Analysis

Summarizes large mass of data into understandable form

Exact descriptions made possible

Identifies the causal factors

Testing the hypothesis

Dr. Manjunath K R, IMSR

Aspects focused in Statistical Analysis

Descriptive Analysis Comparison

Relationships among variables

Dr. Manjunath K R, IMSR

Dr. Manjunath K R, IMSR

Dr. Manjunath K R, IMSR

Nature of relationship among variables

Dr. Manjunath K R, IMSR

Introduction to Statistical Analysis Types of Statistical Measures

of central tendency of Dispersion of Association / Relations

Types of Statistical Measures

Introduction to Statistical Analysis Types of Statistical Measures

of central tendency of Dispersion of Association / Relations

Measures of Central Tendency

Mean, Median and Mode are known as descriptive measures

They give concise description of a group as a whole

They facilitate comparison between groups

Dr. Manjunath K R, IMSR

Measures of Central Tendency Contd..

If an average figure is required for further mathematical

manipulation, arithmetic mean is preferable

To describe a skewed distribution median is preferred

Dr. Manjunath K R, IMSR

Introduction to Statistical Analysis Types of Statistical Measures

of central tendency of Dispersion of Association / Relations

Measures of Dispersion Contd..

Dr. Manjunath K R, IMSR

Introduction to Statistical Analysis Types of Statistical Measures

of central tendency of Dispersion of Association / Relations

Measures of Association / Relationship

Dr. Manjunath K R, IMSR

Measures of Association / Relationship Contd..

Ex Preference of Brand A and age

Brand Preference Youth Yes 120(60%)

Age Adult 120(60%) Total 240

Dr. Manjunath K R, IMSR

Preference of Brand B and Age

Brand Preference Yes No Total

Dr. Manjunath K R, IMSR

Eg: Income and Height.

When r = +1 there is perfect positive relationship.

When r = -1. there is perfect negative relationship.

Dr. Manjunath K R, IMSR

Coefficient of Correlation and Probable Error

The formula is (0.6745) (1-r2) / N

Eg: Income & savings

Dr. Manjunath K R, IMSR

Variables are measured in qualitative terms instead of quantitative terms.

Dr. Manjunath K R, IMSR

Regression & Correlation