Вы находитесь на странице: 1из 9

# VISUALIZING CORRELATION MATRICES IN R

In multivariate analysis, we are often interested in the relationship between different variables, especially correlation.

Correlation is used to measure the degree of linear association between variables. Broadly speaking, correlation signifies
the direction of the relationship between variables – whether they move together, or in opposite direction and the strength
of such direction. Two or more variables may move in either the same direction or in different directions. Same direction
would mean one is increasing and another is also increasing or one is decreasing and another also decreasing. On the
other hand, variables would move in a different direction if one is increasing and another decreasing. Variables that
move in the same direction are often known an having a positive correlation with each other while those moving in an
opposite direction are known to exhibit negative correlation.

The coefficient of correlation, r has a value that lies between 0 and 1 where 0 means no correlation between two variables
and 1 means perfect correlation between them. Mostly, one will find a value that lies between the two. A value of 0.21
will imply low correlation whereas a value of 0.87 will imply high correlation. Apart from the absolute measure, as
described above, direction of the relationship also matters. Where 0.87 signifies high positive correlation, -0.91 describes
high negative correlation.

If our analysis computing correlation involves two variables, the computation and presentation of that correlation to the
intended audience is easy. It is just one number, lying between 0 and 1 say 0.56. However, when our analysis involves
multiple number of variables, it becomes slightly more complex. Here, it is of interest to calculate the correlation
coefficient between EACH variable and every other variable. Again, pairwise calculation is the easy part, but its
presentation is a slightly more complex affair.

Usually, such multivariate correlation analysis is presented in what is known as a correlation matrix where all the
variables are in both rows and columns of the matrix and the individual coefficient between two variables can be seen
at a combination of two variables in the matrix. Here is what a correlation matrix looks like

Page 1
VISUALIZING CORRELATION MATRICES IN R

A B C D E F G

A 1

B 0.65 1

C 0.91 0.14 1

## G 0.74 0.28 0.48 0.73 0.39 -0.19 1

Here, we have 7 variables, A to G mentioned in both rows and columns and the correlation coefficients between two
variables can be seen as a combination of two variables. Here, notice the diagonal shows value of 1, as correlation of a
variable with itself will always be 1. We can see that A and C are highly positively correlated (0.91) whereas A and F
are highly negatively correlated (-0.95).

Notice that the upper triangle of this matrix is empty. This is to avoid confusion because these cells can be filled with
the same numbers as in the lower triangle, as the correlation between variable will be same. Thus, the upper triangle
will be nothing but mirror images of the lower triangle. Therefore, there is no need for filling both triangles of the matrix
and make it look more complicated and confusing.

Right then, the matrix looks perfect but is there is better way to present this matrix and impress your audience? Of
course, there is. R provides an effective way to make attractive correlation matrices. Here is a step by step guide to make
basic correlation matrices and introduce interesting tweaks for enhancement.

Page 2
VISUALIZING CORRELATION MATRICES IN R

In this guide, we are going to work with a dataset already pre-loaded in R, called mtcars. Alternatively, you can upload
your own dataset and work with it.

The dataset contains data extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10
aspects of automobile design and performance for 32 automobiles (1973–74 models). It contains following 11 variables:
mpg (miles per gallon), cyl (number of cylinders), disp (displacement), hp (gross horsepower), drat (rear axle ratio), wt
(weight), qsec (1/4 mile time), vs (engine shape), am (transmission), gear (number of forward gears). Carb ( number of
carburetors).

## Step 1: install the package ‘ggpcorrplot’

install.packages("ggcorrplot")

## Step 2: load the package in R

library(ggcorrplot)

## cormax <- cor(mtcars)

Here, we have created a correlation matrix of all variables in the mtcars dataset and stored in ann object called cormax.
However, there is a problem, if you look at cormax, you will notice that correlations are calculated upto 8 decimal places
which certainly does not make for easy reading.

## cormax <- round(cor(mtcars), 1)

Page 3
VISUALIZING CORRELATION MATRICES IN R

which rounds up the numbers in the matrix to 2 decimal places. Doing that will make you matrix look much cleaner

## Step 4: Visualise the matrix

Now is the time to put numbers into picture. Type the following command:

ggcorrplot(cormax)

Page 4
VISUALIZING CORRELATION MATRICES IN R

Wow, this looks great, so much better than pure, drab numbers, the legend on the right gives out the color scheme –
the darker red the color gets, stronger positive correlation and darker purple the color gets, stronger negative
correlation. Now that we have our basic correlation matrix visual, lets tweak it a bit more

## Step 5: Introduce circles

What if instead of squares in the visual, you want to show circles? Easy.

Page 5
VISUALIZING CORRELATION MATRICES IN R

## ggcorrplot(cormax, method = "circle")

Looks good? Well, its purely a personal choice. Here, you would see that the size of the circle also varies with the
strength of the correlation.

## Step 6: Make only the lower triangle

Page 6
VISUALIZING CORRELATION MATRICES IN R

As discussed earlier, having both ends of the matrix makes it more confusing and adds more detail than required. Lets
correct that by only having the lower triangle.

## ggcorrplot(cormax, type = "lower")

If you want the upper trainagle instead of the upper triangle, you make suitable changes in the command:

Page 7
VISUALIZING CORRELATION MATRICES IN R

## Step 7: Add label to the correlation matrix

What’s missing now? Although you get an idea of how strong or otherwise correlation between two coeffcients is, the
exact number is not visible. To do that, add lab=true as an argument in your ggcorrplot command.

## ggcorrplot(corr, type = "upper", lab = TRUE)

Page 8
VISUALIZING CORRELATION MATRICES IN R

There we go. We now have a complete correlation matrix that looks as good as it gets. It gives a complete
picture of correlation between variables in one look and with will make your explanatory analysis that much more
appealing. Happy visualizing.

Page 9