Академический Документы
Профессиональный Документы
Культура Документы
Manjunath K R, IMSR
7/18/2013
Contents
2
Measures of Dispersion
Measures of Association / Relations
Analysis of Variance
7/18/2013
Analysis Of Data
3
For studying the characteristics of the object under study For determining the pattern of relationships among the variables
7/18/2013
7/18/2013
7/18/2013
Descriptive Analysis
6
It involves distribution of respondents Measures of central tendency are used to describe characteristics
7/18/2013
Comparison
7
Comparison of two or more distributions to know the relative wideness of spread Measures of dispersion such as range, standard deviation, coefficient of variation, ratios, proportions, percentages are made use of to compare
7/18/2013
Relationship between dependent and independent variables Relationship among various variables Correlation, regression analysis are used
7/18/2013
Contents
9
Analysis
of Variance
Dr. Manjunath K R, IMSR 7/18/2013
Measures of central tendency Measures of Dispersion Measures of Association / Relations Analysis of Variance
Time series
Dr. Manjunath K R, IMSR 7/18/2013
Contents
11
Analysis
of Variance
Dr. Manjunath K R, IMSR 7/18/2013
7/18/2013
If the researcher wants a crude descriptive measure or a common or a typical category- mode is used
7/18/2013
Contents
14
Analysis
of Variance
Dr. Manjunath K R, IMSR 7/18/2013
Measures of Dispersion
15
Range, Standard deviation, Coefficient of Variation are the measures of dispersion Range is a crude measure of dispersion Mean deviation considers all values in distribution but it does not give much information about distribution Standard deviation is a measure, computed basing on square root of the mean of squared deviations from Arithmetic mean. It is amenable of further mathematical manipulation.
Dr. Manjunath K R, IMSR 7/18/2013
Variance is the square of standard deviation. Co efficient of variance is expressed as percentage of standard deviation divided by the arithmetic Mean. It is useful for comparison. CV indicates relative variation. Standard deviation and normal distribution. Mean +/- one std dev and so on.
7/18/2013
Contents
17
Analysis
of Variance
Dr. Manjunath K R, IMSR 7/18/2013
Identifying the relationship between two or more variables helps to draw conclusions. In examining relationship, the following questions arise.
Is there relationship between variables under study? What is the direction and degree of relationship? Is the relationship a causal one? Is it statistically significant.
7/18/2013
Cross Tabulation and percentage difference are used to measure the association
No Total
80(40%) 200
80(40%) 200
160 400
7/18/2013
Age
Youth
140(70%) 60(30%) 200
Adult
20(10%) 180(90%) 200
Total
160 240 400
7/18/2013
Correlation
21
Correlation is basically a measure of degree of linear association between two variables. Eg: Income and level of education ; type of seed and yield per acre, years of experience and yearly salary, demand and prices; Income and savings, Height and weight etc..
However it does not necessarily imply that one is the cause of the other. Correlation is an index of positive or negative association or intensity of association between two variables.
Dr. Manjunath K R, IMSR 7/18/2013
Interpretation
22
If coefficient is zero or close to zero, it means that the two variables are not correlated.
Even though the values of two variables are correlated and r is calculated, there is no cause and effect relationship between the forces affecting the distribution of the items.
Interpretation Continued
23
7/18/2013
Probable error of r helps to interpret the reliability of the value of coefficient of correlation.
Co-efficient of Determination
25
It is square of r.
It indicates to what extent the variation in the independent variable affects the dependent variable.
If r = 0.9, then r2= 0.81. It means that 81% of variation in the dependent variable is due to independent variable. If r is 0.707, r2= 0.5 , Therefore 50% Variation is explained. If r is 0.4 , r2= 0.16 . Therefore 16% Variation is explained.
7/18/2013
Rank Correlation
26
Eg: Leadership abilities of managers, judgment in beauty contests, psychological studies, singing competition etc.
To measure the strength of linear relationship between a paired set of such variables, Spearmans rank correlation coefficient is used.
The rank correlation coefficient determines the extent to which the two sets of ranking are in agreement or in disagreement.
The formula is rs = 6 d12 / n(n2 - 1) (Where d is the difference between each pair of ranks and n is the number of pairs in the sample)
7/18/2013
Testing Significance of rs
27
For small samples ( 30), if the table value of rs is greater than the calculated rank coefficient of correlation, then, the null hypothesis is accepted which means that there is no significant difference in the rankings. If the calculated value of rs is greater than the table value, the hypothesis is rejected which means that there is significant difference between the rankings of two judges.
Dr. Manjunath K R, IMSR 7/18/2013
Regression Analysis
28
Regression analysis refers to the methods by which estimates are made of the values of a variable from a knowledge of the values of one or more other variables and to the measurement of the errors involved in this estimation process. ------Morris Hamburg. Regression was used by Sir Francis Galton in 1877 while studying the relationship between the height of fathers and sons.
Dr. Manjunath K R, IMSR 7/18/2013
Purpose
29
To estimate or predict the unknown values of one variable known as dependent variable from known values of another variable called independent variable. The estimation is made by means of a regression equation which describes average relationship between 2 variables.( X and Y) The second goal of RA is to obtain a measure of the error involved in using the regression line as a basis for estimation. For this the standard error estimate is calculated.
Dr. Manjunath K R, IMSR 7/18/2013
Correlation coefficient indicates the closeness or otherwise of relationship between 2 variables but regression explains nature of relationship which helps to predict the values of dependent variable. Correlation speaks of presence of association and does not explain which is effect and which is cause. ex: price and demand; Marital status and income. Regression on the other hand implies cause and effect relationship between independent and dependent variable and it necessarily implies association between the two variables. There can be non-sense correlation, which is called spurious Correlation. ex: Teachers salaries and consumption of liquor. It does not prove increase in sale of liquor increases teachers salaries. There is nothing like non-sense regression.
7/18/2013
Variables may have linear or non-linear relationships. Linear relationship means an equation of a straight line of the form of y= a+bx, where a and b are constants and it describes the average relationship that exists between x and y. There are two regression lines or equations. One is regression line of x on y and the other regression equation of y on x. They intersect at one point and from that point a perpendicular line drawn on x-axis gives the average of x and a horizontal line drawn on y-axis, gives us the average (mean) of y. Regression equation of Y on X is computed by solving the normal equations: Y = Na + bX XY = aX + bx Regression equation of X and Y is computed by solving X = Na + bY XY = aY + bY
7/18/2013
Regression equations cannot predict perfectly. Sale of gasoline based on automobile registration cannot be estimated 100% perfect. So a measure which would indicate how precise the prediction of Y is, based on X or vice versa ---- is called standard error of estimate, symbolised as Syx.
7/18/2013
7/18/2013
34
The assumption that the relationship has not changed since the regression equation is computed. The relationship between the variables exists up to a certain limit and beyond that point, the relationship changes. ex: Yield of crop increases if doses of fertilizers is increased, stands good up to a limit only.
7/18/2013
Illustration
35
XY
X2
Y2
6
2 10 4
9
11 5 8
54
22 50 32
36
4 100 16
81
121 25 64
7
40
56
214
64
220
49
340
30
7/18/2013
Illustration Continued
36
Two equations are X = Na+bY 5a+40b = 30 XY = aY+bY2 40a + 340b = 214 Solving them we get a = 16.4 ; b = -1.3 Equation is X = 16.4 1.3Y
Dr. Manjunath K R, IMSR 7/18/2013
Illustration Continued
37
Solving them we get a = 11.9 ; b = -0.65 Equation is Y = 11.9 0.65X Note: Reg. Eq. Of X on Y is used to compute values of X, given the values of Y and similarly regression equation of Y on X is used to calculate the values of Y given the values of X
Dr. Manjunath K R, IMSR 7/18/2013
Contents
38
Analysis
of Variance
Dr. Manjunath K R, IMSR 7/18/2013
Samples drawn from normal population should have same variance. It is needed to make proper comparison If variance between two samples is more, the results give a distorted picture It enables to analyze the total variation of our data into components which may be attributed to various sources or causes of variation.
7/18/2013
It is a technique to find whether the samples are drawn from the same population or not. For Ex: To know whether there is any significant effect of application of five fertilisers on four plots on their yields. The purpose of f test is not the significance of difference between two sample variances. Its purpose is to test for the significance of the differences among sample means.
Dr. Manjunath K R, IMSR 7/18/2013
ANOVA is known as F test in honour of R.A. Fisher. It is also known as variance ratio test t test is adequate when we have means of two samples. F test is gainfully applied when three or more samples to consider.
7/18/2013
Assumptions in ANOVA
42
Normality Homogeneity Independence of error If the distributions are bimodal or very skewed, the F test results may not be valid.
7/18/2013
Technique of ANOVA
43
One-way classification
there
Two-way classification
two
independent factors might have an effect on the response variable of interest two sets of hypothesis
7/18/2013
One-way classification
44
Formulate null hypothesis Ho=1=2=3=- - - -k Calculate variance between the samples Calculate variance within the samples Calculate F=(Between-column variance) / (withinColumn variance)
Compare the calculated value of F with the table value for the degrees of freedom at a certain level significance. Accept or reject the hypothesis
7/18/2013
ANOVA TABLE
45
SS
(sum of squares)
MS mean square
Variance ratio of F
SSC SSE
Total
SST
N1
SST=Total sum of squares of variations SSC=Sum of squares between samples(Columns) SSE= Sum of squares within samples(Rows) MSC=Mean sum of squares between samples MSE= Mean sum of squares within samples
7/18/2013
46
The results of four schools for a subject common to all are given below. Make an analysis of variance of the data.
B school 12 11 9 14 4 C school D school 18 12 16 6 8 13 9 12 16 15 8 10 12 8 7
A School
7/18/2013
The variation within samples measures the influence of the chance forces which cause the individual observations to vary from one another. The variation between the samples reflects not only the influence of chance forces but also the effect of other forces, which cause the various sample means to differ from one another. If the hypothesis is not true, it means that the variation between the samples will tend to be larger than that of within samples
Dr. Manjunath K R, IMSR 7/18/2013
48
When two independent factors might have an effect on the variable of interest, two way classification technique is adopted.
7/18/2013