Вы находитесь на странице: 1из 10

Overview

Main topics in multivariate statistics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Main topics in multivariate statistics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Exploratory methods Graphics for multivariate data . . . . . Principal component analysis (PCA) Possible uses of PCA . . . . . . . . . . . Possible uses of PCA . . . . . . . . . . . Factor analysis: idea . . . . . . . . . . . Factor analysis: model . . . . . . . . . . Factor analysis . . . . . . . . . . . . . . . Linear discriminant analysis. . . . . . . Linear discriminant analysis. . . . . . . Cluster analysis . . . . . . . . . . . . . . . Multidimensional scaling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

More formal methods Normal distribution theory . . . . . . . . . . Tests of signicance for multivariate data Canonical correlation . . . . . . . . . . . . . . Remaining topics we did not cover . . . . .

Main topics in multivariate statistics


s

We have data on several variables, there is some interdependence between the variables, and none of them is clearly the main variable of interest s Methods that are mostly of exploratory nature: x Graphics for multivariate data x Principal component analysis (PCA) x Factor analysis x Linear discriminant analysis (LDA) x Cluster analysis x Multidimensional scaling x ... 2 / 20

Main topics in multivariate statistics


s

More formal topics: x Normal distribution theory x Tests of signicance for multivariate data x Multivariate analysis of variance (MANOVA) x Canonical correlation analysis x ... 3 / 20

Exploratory methods
Graphics for multivariate data
Goal: visualize multivariate data s We covered: x Scatterplot matrix: pairs() x Star plots and segment plots: stars() x Conditioning plots: coplot() x Bi-plot of rst two principal components: biplot() s Other techniques: x Interactive 3 dimensional plots x Plots based on multidimensional scaling (more about this later) x ...
s

4 / 20

5 / 20

Principal component analysis (PCA)


Main idea: x Start with variables X1 , . . . , Xp x Find a rotation of these variables, say Y1 , . . . , Yp (called principal components), so that: s Y1 , . . . , Yp are uncorrelated. Idea: they measure dierent dimensions of the data. s Var(Y1 ) Var(Y2 ) . . . Var(Yp ). Idea: Y1 is most important, then Y2 , etc. s Method is based on spectral decomposition of the covariance matrix s No need to make distributional assumptions
s

6 / 20

Possible uses of PCA


s

Interest in rst principal component: x Example: How to combine the scores on 5 dierent examinations to a total score? Since the rst principal component maximizes the variance, it spreads out the scores as much as possible. s Interest in 2nd - pth principal components: x When all measurements are positively correlated, the rst principal component is often some kind of average of the measurements (e.g., size of birds, severity index of psychiatric symptoms). x Then the other principal components give important information about the remaining pattern (e.g., shape of birds, pattern of psychiatric symptoms) 7 / 20

Possible uses of PCA


s

Interest in rst few principal components: x Dimension reduction: summarize the data with a smaller number of variables, losing as little information as possible. x Can be used for graphical representations of the data (bi-plot). s Use PCA as input for regression analysis: x Highly correlated explanatory variables are problematic in regression analysis. x One can replace them by their principal components, which are uncorrelated by denition. 8 / 20

Factor analysis: idea


s

Idea:
x In social sciences (e.g., psychology), it is often not possible to measure the variables of interest

directly (e.g., intelligence, social class). Such variables are called latent variables or common factors. x Researchers examine such variables indirectly, by measuring variables that can be measured and that are believed to be indicators of the latent variables of interest (e.g., examination scores on various tests) x We want to relate the latent variables of interest to the measured variables 9 / 20

Factor analysis: model


s

Multiple linear regression model: x1 = 11 f1 + + 1k fk + u1 x2 = 21 f1 + + 2k fk + u2 . . .= . . . xp = p1 f1 + + pk fk + up

where x x = (x1 , . . . , xp ) are the observed variables (random) x f = (f1 , . . . , fk ) are the common factors (random) x u = (u1 , . . . , up ) are the specic factors (random) x ij are the factor loadings (constants) s Note: f1 , . . . , fk are not observed s Main goal: estimate factor loadings 10 / 20

Factor analysis
Assumptions: x E(x) = 0 (if this is not the case, simply subtract the mean vector) x E(f ) = 0, Cov(f ) = I x E(u) = 0, Cov(ui , uj ) = 0 for i = j x Cov(f, u) = 0 s Estimation: x Under the above assumptions, Cov(x) = = + x Two estimation methods: principal factor analysis and maximum likelihood s Factor loadings are non-unique; factor rotation can be used to ease interpretation
s

11 / 20

Linear discriminant analysis


Goal: Suppose that we have an n p data matrix consisting of g dierent groups. How can we classify new observations into one of these groups? This is sometimes called supervised learning s Fisher: x Look for the linear function Xa which maximizes the ratio of the between-groups sum of squares to the within-groups sum of squares. x Compute average score (i ) a for each group i = 1, . . . , g. x x Compute the score xnew a for the new observation. x Classify the new observation in group j if |xnew a (j ) a| < |xnew a (i ) a| for all i = j. x x
s

12 / 20

Linear discriminant analysis


s

Maximum likelihood: x Suppose the exact distributions of the populations 1 , . . . , g are known x Then the maximum likelihood discriminant rule is to allocate an observation x to the population which gives the largest likelihood to x, i.e., to the population with the highest density at the point x. x If the exact distributions are unknown, but we know the shape of the distributions, then we can rst estimate their parameters, and then use the above rule. This is the sample maximum likelihood discriminant rule. s For two groups from two multivariate normal distributions with the same covariance matrix, Fishers linear discriminant analysis equals the maximum likelihood rule. 13 / 20

Cluster analysis
We have multivariate data without group labels. s We want to see if there are clusters in the data, i.e., groups of observations that are homogeneous and separated from the other groups. This is sometimes called unsupervised learning. s Methods we discussed: x Hierarchical clustering x k-means clustering x Model based clustering s Possible applications: x Marketing: nd groups of customers with similar behavior x Biology: classify plants or animals x Internet: cluster text documents
s

14 / 20

Multidimensional scaling
s s s

s s

Not discussed in class Goal: Construct a map from a distance matrix, where the map should represent the distances between the objects as accurate as possible. Possible applications: x Psychology/sociology: subjects say how similar/dierent pairs of objects are. Multidimensional scaling then creates a pictures showing the overall relationships between the subjects. Can be used to aid clustering See overhead slides and R-code 15 / 20

More formal methods


Normal distribution theory
Multivariate normal distribution s Wishart distribution (for sample covariance matrix) s Hotellings T 2 distribution (for Mahalonobis distance, closely related to F -distribution)
s

16 / 20

17 / 20

Tests of signicance for multivariate data


Discussed in class: x Comparison of mean values for two samples, when covariance matrices are assumed to be identical: multivariate T 2 -test s Other tests: x Comparison of mean values for several samples x Comparison of mean values for several samples when covariance matrices are not the same x Comparison of variation for two samples x Comparison of variation for several samples
s

18 / 20

Canonical correlation
s s s s s

We study the relationship between a group of variables Y1 , . . . , Yp and another group of variables X1 , . . . , Xq by searching for linear combinations ai X and bi Y that are most highly correlated Size of (ai X, bi Y ) tells us about the strength of the relationship between X and Y Loadings in ai and bi tell us about the type of relationship between X and Y One can test if the true canonical correlation is dierent from zero (not discussed in class) Possible application: nd clusters among the variables (instead of among the observations) 19 / 20

Remaining topics we did not cover


MANOVA: multivariate version of ANOVA (analysis of variance) s Multivariate regression: multivariate version of multiple regression (when doing least squares, estimates are the same as when doing multiple regression for each dependent variable separately)
s

20 / 20

10

Вам также может понравиться