Академический Документы
Профессиональный Документы
Культура Документы
Objectives
To determine "typical" values of variables. To look into measures of spread of data. To look into measures of how variables are correlated . To determine how to reduce variables through factor analysis. To look at how to assess the reliability of data.
Content
Descriptive Statistics Correlation Analysis Factor Analysis Reliability Analysis
x x= n
Measures of Location
Mean is an appropriate measure of the center of the data if the data has a symmetric distribution with light tails. Median if the distribution has heavy tails or is asymmetric. Median is resistant.
To calculate the quartiles: 1) Arrange the observations in increasing order and locate the median M. 2) The first quartile Q1 is the median of the observations located to the left of the median in the ordered list. 3) The third quartile Q3 is the median of the observations located to the right of the median in the ordered list. The interquartile range (IQR) is defined as: IQR = Q3 Q1
Five-Number Summary
The five-number summary of a distribution consists of the smallest observation, the first quartile, the median, the third quartile, and the largest observation, written in order from smallest to largest.
Minimum Q1 M Q3 Maximum
The most common measure of spread looks at how far each observation is from the mean. This measure is called the standard deviation.
The standard deviation sx measures the average distance of the observations from their mean. It is calculated by finding an average of the squared distances and then taking the square root. This average squared distance is called the variance.
( x1 - x ) 2 + ( x 2 - x ) 2 + ... + ( x n - x ) 2 1 variance = s = = ( x i - x )2 n -1 n -1
2 x
1 2 standard deviation = sx = ( x x ) i n -1
Frequency Table
> prob <- dogB$prob # saves some typing effort > prob.freq = table(prob) > prob.freq prob 1 2 3 4 5 6 7 8 9 10 8 16 8 8 8 12 8 8 12 12
R Multiple Response
> subdata <- dogB[,c('kbblemix','mixbars','bonnys', 'doggies')] # a contains 4 columns representing the 4 brands > b = sum(subdata == "r") # b is the total number of brand recalled responses >b [1] 216 > d = colSums(subdata == "r") # number of brand recalled responses for each brand >d kbblemix mixbars bonnys doggies 52 56 52 56 > f = as.numeric(c(d, b)) # f stores the frequencies >f [1] 52 56 52 56 216
R Multiple Response
> data.frame( brands = c(names(d), "Total"), freq=f, percent = (f/b)*100) # produce the output brands freq percent 1 kbblemix 52 24.07407 2 mixbars 56 25.92593 3 bonnys 52 24.07407 4 doggies 56 25.92593 5 Total 216 100.00000
Sample Covariance
Cov( x, y)
xi x yi y N 1
Covxy sx s y xi x yi y N 1sx s y
R Correlation
To calculate the covariance and correlation amongst environ2, environ3 and environ5, first extract the three variables from data by: > green <- data[,c('environ2', 'environ3', 'environ5')] > cov(green) # calculate the covariance matrix environ2 environ3 environ5 environ2 0.25212121 0.09090909 0.09575758 environ3 0.09090909 0.25252525 0.08585859 environ5 0.09575758 0.08585859 0.25242424 > cor(green) # calculate the correlation matrix environ2 environ3 environ5 environ2 1.0000000 0.3602883 0.3795796 environ3 0.3602883 1.0000000 0.3400680 environ5 0.3795796 0.3400680 1.0000000
Factor Analysis
Factor Analysis
> library(rela) # requires the rela package > g = as.matrix(green) #change data file to a matrix > pa <- paf(g) # use paf function in rela to do factor analysis > summary(pa) # obtain the output
Factor Analysis
Factor Analysis
> barplot(pa$Eigenvalues[,1]) # draw the first column of eigenvalues
Factor Analysis
> pav <varimax(pa$Factor.Loadings) # Varimax rotation > pav # get the output on the right > scores <- g %*% as.matrix(pav$loadings) # get factor scores
R Reliability Procedure
> library(multilevel) > green1 <- green[, c('environ1', 'environ4')] > green2 <- green[, c('environ2', 'environ3', 'environ5')] > cronbach(green1)$Alpha [1] 0.85513 > cronbach(green2)$Alpha [1] 0.62788
References
The Basic Practice of Statistics, 6ed. by Moore, D., W. Notz & M. Fligner, Chapters 2, 4 (pp.106-114).
33