Вы находитесь на странице: 1из 48

Dr.

Manjunath K R, IMSR

STATISTICAL ANALYSIS OF DATA

7/18/2013

What a researcher should know?

Contents
2

Introduction to Statistical Analysis Types of Statistical Measures


Measures of central tendency

Measures of Dispersion
Measures of Association / Relations

Analysis of Variance

Dr. Manjunath K R, IMSR

7/18/2013

Analysis Of Data
3

Critical examination of the assembled data

For studying the characteristics of the object under study For determining the pattern of relationships among the variables

Dr. Manjunath K R, IMSR

7/18/2013

Purpose of Statistical Analysis


4

Summarizes large mass of data into understandable form

Exact descriptions made possible

Identifies the causal factors


Helps to draw inferences and lead to make predictions

Testing the hypothesis

Dr. Manjunath K R, IMSR

7/18/2013

Aspects focused in Statistical Analysis


5

Descriptive Analysis Comparison

Relationships among variables


Factor analysis

Dr. Manjunath K R, IMSR

7/18/2013

Descriptive Analysis
6

It involves distribution of respondents Measures of central tendency are used to describe characteristics

Dr. Manjunath K R, IMSR

7/18/2013

Comparison
7

Comparison of two or more distributions to know the relative wideness of spread Measures of dispersion such as range, standard deviation, coefficient of variation, ratios, proportions, percentages are made use of to compare

Dr. Manjunath K R, IMSR

7/18/2013

Nature of relationship among variables


8

Relationship between dependent and independent variables Relationship among various variables Correlation, regression analysis are used

Dr. Manjunath K R, IMSR

7/18/2013

Contents
9

Introduction to Statistical Analysis Types of Statistical Measures


Measures Measures Measures

of central tendency of Dispersion of Association / Relations

Analysis

of Variance
Dr. Manjunath K R, IMSR 7/18/2013

Types of Statistical Measures


10

Measures of central tendency Measures of Dispersion Measures of Association / Relations Analysis of Variance

Time series
Dr. Manjunath K R, IMSR 7/18/2013

Contents
11

Introduction to Statistical Analysis Types of Statistical Measures


Measures Measures Measures

of central tendency of Dispersion of Association / Relations

Analysis

of Variance
Dr. Manjunath K R, IMSR 7/18/2013

Measures of Central Tendency


12

Mean, Median and Mode are known as descriptive measures

They give concise description of a group as a whole

They facilitate comparison between groups

Dr. Manjunath K R, IMSR

7/18/2013

Measures of Central Tendency Contd..


13

If the researcher wants a crude descriptive measure or a common or a typical category- mode is used

If an average figure is required for further mathematical

manipulation, arithmetic mean is preferable

To describe a skewed distribution median is preferred

Dr. Manjunath K R, IMSR

7/18/2013

Contents
14

Introduction to Statistical Analysis Types of Statistical Measures


Measures Measures Measures

of central tendency of Dispersion of Association / Relations

Analysis

of Variance
Dr. Manjunath K R, IMSR 7/18/2013

Measures of Dispersion
15

Range, Standard deviation, Coefficient of Variation are the measures of dispersion Range is a crude measure of dispersion Mean deviation considers all values in distribution but it does not give much information about distribution Standard deviation is a measure, computed basing on square root of the mean of squared deviations from Arithmetic mean. It is amenable of further mathematical manipulation.
Dr. Manjunath K R, IMSR 7/18/2013

Measures of Dispersion Contd..


16

Variance is the square of standard deviation. Co efficient of variance is expressed as percentage of standard deviation divided by the arithmetic Mean. It is useful for comparison. CV indicates relative variation. Standard deviation and normal distribution. Mean +/- one std dev and so on.

Dr. Manjunath K R, IMSR

7/18/2013

Contents
17

Introduction to Statistical Analysis Types of Statistical Measures


Measures Measures Measures

of central tendency of Dispersion of Association / Relations

Analysis

of Variance
Dr. Manjunath K R, IMSR 7/18/2013

Measures of Association / Relationship


18

Identifying the relationship between two or more variables helps to draw conclusions. In examining relationship, the following questions arise.

Is there relationship between variables under study? What is the direction and degree of relationship? Is the relationship a causal one? Is it statistically significant.

Dr. Manjunath K R, IMSR

7/18/2013

Measures of Association / Relationship Contd..


19

Cross Tabulation and percentage difference are used to measure the association

Ex Preference of Brand A and age

Brand Preference Youth Yes 120(60%)

Age Adult 120(60%) Total 240

No Total

80(40%) 200

80(40%) 200

160 400

Dr. Manjunath K R, IMSR

7/18/2013

Preference of Brand B and Age


20

Brand Preference Yes No Total

Age

Youth
140(70%) 60(30%) 200

Adult
20(10%) 180(90%) 200

Total
160 240 400

Dr. Manjunath K R, IMSR

7/18/2013

Correlation
21

Correlation is basically a measure of degree of linear association between two variables. Eg: Income and level of education ; type of seed and yield per acre, years of experience and yearly salary, demand and prices; Income and savings, Height and weight etc..

However it does not necessarily imply that one is the cause of the other. Correlation is an index of positive or negative association or intensity of association between two variables.
Dr. Manjunath K R, IMSR 7/18/2013

Interpretation
22

If coefficient is zero or close to zero, it means that the two variables are not correlated.

Even though the values of two variables are correlated and r is calculated, there is no cause and effect relationship between the forces affecting the distribution of the items.

Eg: Income and Height.


Dr. Manjunath K R, IMSR 7/18/2013

Interpretation Continued
23

When r = +1 there is perfect positive relationship.

When r = -1. there is perfect negative relationship.


When r = 0 . There is no relationship. When r is close to +1 or 1, the closer the relationship. Full interpretation of r depends upon circumstances of individual cases. The closeness of the relationship is not proportional to r.

Dr. Manjunath K R, IMSR

7/18/2013

Coefficient of Correlation and Probable Error


24

Probable error of r helps to interpret the reliability of the value of coefficient of correlation.

The formula is (0.6745) (1-r2) / N


If r is less than six times the probable error, the correlation is practically certain and it is significant.
Dr. Manjunath K R, IMSR 7/18/2013

Co-efficient of Determination
25

It is square of r.

It indicates to what extent the variation in the independent variable affects the dependent variable.
If r = 0.9, then r2= 0.81. It means that 81% of variation in the dependent variable is due to independent variable. If r is 0.707, r2= 0.5 , Therefore 50% Variation is explained. If r is 0.4 , r2= 0.16 . Therefore 16% Variation is explained.

Eg: Income & savings

Dr. Manjunath K R, IMSR

7/18/2013

Rank Correlation
26

Variables are measured in qualitative terms instead of quantitative terms.

Eg: Leadership abilities of managers, judgment in beauty contests, psychological studies, singing competition etc.
To measure the strength of linear relationship between a paired set of such variables, Spearmans rank correlation coefficient is used.

The rank correlation coefficient determines the extent to which the two sets of ranking are in agreement or in disagreement.
The formula is rs = 6 d12 / n(n2 - 1) (Where d is the difference between each pair of ranks and n is the number of pairs in the sample)

Dr. Manjunath K R, IMSR

7/18/2013

Testing Significance of rs
27

For small samples ( 30), if the table value of rs is greater than the calculated rank coefficient of correlation, then, the null hypothesis is accepted which means that there is no significant difference in the rankings. If the calculated value of rs is greater than the table value, the hypothesis is rejected which means that there is significant difference between the rankings of two judges.
Dr. Manjunath K R, IMSR 7/18/2013

Regression Analysis
28

Regression analysis refers to the methods by which estimates are made of the values of a variable from a knowledge of the values of one or more other variables and to the measurement of the errors involved in this estimation process. ------Morris Hamburg. Regression was used by Sir Francis Galton in 1877 while studying the relationship between the height of fathers and sons.
Dr. Manjunath K R, IMSR 7/18/2013

Purpose
29

To estimate or predict the unknown values of one variable known as dependent variable from known values of another variable called independent variable. The estimation is made by means of a regression equation which describes average relationship between 2 variables.( X and Y) The second goal of RA is to obtain a measure of the error involved in using the regression line as a basis for estimation. For this the standard error estimate is calculated.
Dr. Manjunath K R, IMSR 7/18/2013

Regression & Correlation


30

Correlation coefficient indicates the closeness or otherwise of relationship between 2 variables but regression explains nature of relationship which helps to predict the values of dependent variable. Correlation speaks of presence of association and does not explain which is effect and which is cause. ex: price and demand; Marital status and income. Regression on the other hand implies cause and effect relationship between independent and dependent variable and it necessarily implies association between the two variables. There can be non-sense correlation, which is called spurious Correlation. ex: Teachers salaries and consumption of liquor. It does not prove increase in sale of liquor increases teachers salaries. There is nothing like non-sense regression.

Dr. Manjunath K R, IMSR

7/18/2013

Regression lines & equations


31

Variables may have linear or non-linear relationships. Linear relationship means an equation of a straight line of the form of y= a+bx, where a and b are constants and it describes the average relationship that exists between x and y. There are two regression lines or equations. One is regression line of x on y and the other regression equation of y on x. They intersect at one point and from that point a perpendicular line drawn on x-axis gives the average of x and a horizontal line drawn on y-axis, gives us the average (mean) of y. Regression equation of Y on X is computed by solving the normal equations: Y = Na + bX XY = aX + bx Regression equation of X and Y is computed by solving X = Na + bY XY = aY + bY

Dr. Manjunath K R, IMSR

7/18/2013

Standard error of estimate


32

Regression equations cannot predict perfectly. Sale of gasoline based on automobile registration cannot be estimated 100% perfect. So a measure which would indicate how precise the prediction of Y is, based on X or vice versa ---- is called standard error of estimate, symbolised as Syx.

Dr. Manjunath K R, IMSR

7/18/2013

Regression equation from correlation


33

Reg. Equation of X on Y = X- X = ( x/y) (y-y) Reg. Equation of Y on X = Y-Y = ( y/x) (x-x)

Reg. Coefficient of X on Y = bxy = ( x/y)


Reg. Coefficient of Y on X = byx = ( y/x)

Dr. Manjunath K R, IMSR

7/18/2013

34

Limitations of Regression Analysis

The assumption that the relationship has not changed since the regression equation is computed. The relationship between the variables exists up to a certain limit and beyond that point, the relationship changes. ex: Yield of crop increases if doses of fertilizers is increased, stands good up to a limit only.

Dr. Manjunath K R, IMSR

7/18/2013

Illustration
35

Obtain 2 regression equations from the following data

XY

X2

Y2

6
2 10 4

9
11 5 8

54
22 50 32

36
4 100 16

81
121 25 64

7
40

56
214

64
220

49
340

30

Dr. Manjunath K R, IMSR

7/18/2013

Illustration Continued
36

Reg. eq. Of X on Y is: Xc = a+by

Two equations are X = Na+bY 5a+40b = 30 XY = aY+bY2 40a + 340b = 214 Solving them we get a = 16.4 ; b = -1.3 Equation is X = 16.4 1.3Y
Dr. Manjunath K R, IMSR 7/18/2013

Illustration Continued
37

Reg. eq. Of Y on X is: Yc = a+bx


Y = Na + bX 5a+30b = 40 XY = aX+bX2 30a + 220b = 214

Two equations are


Solving them we get a = 11.9 ; b = -0.65 Equation is Y = 11.9 0.65X Note: Reg. Eq. Of X on Y is used to compute values of X, given the values of Y and similarly regression equation of Y on X is used to calculate the values of Y given the values of X
Dr. Manjunath K R, IMSR 7/18/2013

Contents
38

Introduction to Statistical Analysis Types of Statistical Measures


Measures Measures Measures

of central tendency of Dispersion of Association / Relations

Analysis

of Variance
Dr. Manjunath K R, IMSR 7/18/2013

Need for ANOVA


39

Samples drawn from normal population should have same variance. It is needed to make proper comparison If variance between two samples is more, the results give a distorted picture It enables to analyze the total variation of our data into components which may be attributed to various sources or causes of variation.

Dr. Manjunath K R, IMSR

7/18/2013

Meaning of Analysis of Variance


40

It is a technique to find whether the samples are drawn from the same population or not. For Ex: To know whether there is any significant effect of application of five fertilisers on four plots on their yields. The purpose of f test is not the significance of difference between two sample variances. Its purpose is to test for the significance of the differences among sample means.
Dr. Manjunath K R, IMSR 7/18/2013

Difference - t test & F test


41

ANOVA is known as F test in honour of R.A. Fisher. It is also known as variance ratio test t test is adequate when we have means of two samples. F test is gainfully applied when three or more samples to consider.

Dr. Manjunath K R, IMSR

7/18/2013

Assumptions in ANOVA
42

Normality Homogeneity Independence of error If the distributions are bimodal or very skewed, the F test results may not be valid.

Dr. Manjunath K R, IMSR

7/18/2013

Technique of ANOVA
43

One-way classification
there

is only one criteria one set of hypothesis

Two-way classification
two

independent factors might have an effect on the response variable of interest two sets of hypothesis

Dr. Manjunath K R, IMSR

7/18/2013

One-way classification
44

Formulate null hypothesis Ho=1=2=3=- - - -k Calculate variance between the samples Calculate variance within the samples Calculate F=(Between-column variance) / (withinColumn variance)

Compare the calculated value of F with the table value for the degrees of freedom at a certain level significance. Accept or reject the hypothesis

Dr. Manjunath K R, IMSR

7/18/2013

ANOVA TABLE
45

Source of variation Between samples Within Samples

SS
(sum of squares)

V degrees of freedom V1=c 1 V2= n - c

MS mean square

Variance ratio of F

SSC SSE

MSC=SSC/(c 1) MSE=SSE/(n c) MSC/MSE

Total

SST

N1

SST=Total sum of squares of variations SSC=Sum of squares between samples(Columns) SSE= Sum of squares within samples(Rows) MSC=Mean sum of squares between samples MSE= Mean sum of squares within samples

Dr. Manjunath K R, IMSR

7/18/2013

46

Example for one way classification

The results of four schools for a subject common to all are given below. Make an analysis of variance of the data.
B school 12 11 9 14 4 C school D school 18 12 16 6 8 13 9 12 16 15 8 10 12 8 7

A School

Dr. Manjunath K R, IMSR

7/18/2013

Rationale of the test


47

The variation within samples measures the influence of the chance forces which cause the individual observations to vary from one another. The variation between the samples reflects not only the influence of chance forces but also the effect of other forces, which cause the various sample means to differ from one another. If the hypothesis is not true, it means that the variation between the samples will tend to be larger than that of within samples
Dr. Manjunath K R, IMSR 7/18/2013

48

ANOVA of two way classification

When two independent factors might have an effect on the variable of interest, two way classification technique is adopted.

Dr. Manjunath K R, IMSR

7/18/2013

Вам также может понравиться