Вы находитесь на странице: 1из 31

STATS 330: Lecture 2

Graphics

23.07.2014
Housekeeping

I Contact details
Office auckland.ac.nz hours
Steffen Klaere 303.219 s.klaere 10:0012:00, Thu
Alan Lee 303S.265 aj.lee 10:3012:00, Tue+Thu

I Class representatives
Course aucklanduni.ac.nz
?? 330 ??
?? 762 ??

I Assignment 1 is due August 7


Todays Lecture: Exploratory graphics

I Todays lecture will give you a quick overview of the kinds of


graphs that can be helpful in exploring data.

I Some of the material has been covered in Stats20x.

I We will discuss the R code used to make these plots in


Lecture 4.
Exploratory Graphics: Topics

I One variable
Aim: explore distribution of values
Plots: Histograms, Kernel density estimators, QQ plots

I Two variables
Aim: explore relationship of variables
Both continuous: Scatter plot
One of each: Side-by-side box plots or violin plots
Both categorical: Mosaic plots (see Chapter 5)

I Three or more variables


Aim: Identify pairs of relationships, visualise GLMs
Plots: Pairs plots, Rotating plots, coplots, 3D plots,
contour plots
Single Variable: Exchange Rate Data

Data: daily changes in log(exchange rate) for USD/NZD


I Daily data from June 1986 to May 2014
I Source: Reserve Bank
http://www.rbnz.govt.nz/statistics/
Questions: I What is the distribution of the daily changes in
the logged exchange rate?
I Is it normal? If not, how is it different?
Data analysis

 
yt
t = log(yt ) log(yt1 ) = log
yt1
Suppose we have the data (t , t = 2, . . . , 6980), in an R vector,
diff.in.logs
# Draw histogram
hist(diff.in.logs,nclass=100,freq=FALSE)
# Add density estimates
lines(density(diff.in.logs),col="blue",lwd=2)
# Add fitted normal density
xvec <- seq(-0.2,0.1,length=100)
lines(xvec,dnorm(xvec,mean=mean(diff.in.logs),
sd=sd(diff.in.logs)),col="red",lwd=2)
Histogram of diff.in.logs

80
60
Density

40
20
0

0.10 0.05 0.00 0.05

diff.in.logs
Histogram of diff.in.logs

80
60
Density

40
20
0

0.10 0.05 0.00 0.05

diff.in.logs
Normal QQ Plot

Normal QQ Plot

qqnorm(diff.in.logs)

0.05

qqline(diff.in.logs,lwd=3)










































































































Normal data? NO












I

0.00




















































































Sample Quantiles
























































































QQ plot indicates that the



I



























differences have longer tails

0.05

than normal
I Plotted points are below line
0.10

for smaller and above line 4 2 0 2 4

for larger values. Theoretical Quantiles


Two variables: Rats!

Given: growth rates of 16 rats, i.e. relationship between weight and


time.
I Want to explore the relationship graphically.
I Each rat was measured (roughly) every week for 10 weeks.
I For weeks 15, all rats were on a fixed diet.
I Diet was changed after week 6.
Two variables: Rats!

Dataset rats.df has variables


growth weight in grams
group litter, labelled 13
rat individual rat, labelled 116
change labelled 12. Diet was changed after 6 weeks, diet 1
for weeks 15, diet 2 for weeks 610.
day: day since start of study, 11 values, approximately
weekly intervals.
Rats! Simple visualisation

600




















500






Weight (grams)















400


300

0 10 20 30 40 50 60

Time (days)
Rats! More sophisticated visualisation
Growth rates for rats

600

500
Weight (grams)

400

Litter 1 Litter 2 Litter 3

300

0 20 40 60

Time (days)
Rats! And more sophistication
0 20 40 60 0 20 40 60

rat rat rat rat



600






500
400
300

rat rat rat rat



600


500





400
Weight (grams)

300

rat rat rat rat


600
500
400



300



rat rat rat rat


600
500
400
300






0 20 40 60 0 20 40 60

Time (days)
Rats! And more sophistication

0 20 40 60 0 20 40 60 0 20 40 60 0 20 40 60

group group group group group group group group


within.group within.group within.group within.group within.group within.group within.group within.group
600
500
400
300

group group group group group group group group


within.group within.group within.group within.group within.group within.group within.group within.group
Weight (grams)

600
500
400
300

group group group group group group group group


within.group within.group within.group within.group within.group within.group within.group within.group
600
500
400
300

0 20 40 60 0 20 40 60 0 20 40 60 0 20 40 60

Time (days)
One continuous and one categorical variable

I Measurement of mouse body temperature every 15 minutes


over duration of 25 days

I Wish to visualise relationship between day time (categorical)


and body temperature (continuous)

I Side-by-side boxplots
The fever of mice
39.0
38.5
38.0
body temperature

LOESS Smoothing
37.5
37.0
36.5
36.0

00:00 01:45 03:30 05:15 07:00 08:45 10:30 12:15 14:00 15:45 17:30 19:15 21:00 22:45

day time
More than two variables
I If all variables are continuous, we can explore the relationships
between them using a pairs plot
I If we have three variables, a rotating plot is a very useful tool
I Example: Cherry trees
Pairs Plot for Cherry Trees
65 70 75 80 85

20
18




16

diameter

14



12





10


8


85




80








height

75



70


65

70
60


50
volume

40



30




20



10

8 10 12 14 16 18 20 10 20 30 40 50 60 70
3D Rotating plots

I The challenge: to represent a 3-dimensional object on a


2-dimensional surface (current screen types)

I Traditional method uses projection, perspective

I A powerful idea is to use motion, looking at the 3D scene


from different angles
Perspective
Projection

70
60

50
Volume

40



30





20



10

8 10 12 14 16 18 20

Diameter

plot(diameter,volume,
plot3d(cherry.df)
data=cherry.df)
Dynamic motion

I reg3d(cherry.df,wire=TRUE)
I By dynamically changing the angle
of view, we get a better impression
of the 3-dimensional structure of
the data
I Dynamic graphics is a very
powerful tool
Fast Normal Slow
Pause/Resume
A powerful idea: Coplots

I Coplots show relationship between x and y for selected values


of z (usually a narrow range of z)

I By showing separate plots for different z ranges, we can see


how the relationship between x and y changes as z changes

I Coplot: conditioning plot, shows relationship between x and y


conditional on z (i.e. for fixed z)
Cherry trees: coplots

I To show the relationship between height and volume for


different values of diameter:

I Divide the range of diameter (8.3 to 20.6) up into 6


subranges: 8 11, 10.5 11.5 etc

I Draw 6 plots, the first using all data whose diameter is


between 8 and 11, the second using all data whose diameter is
between 10.5 and 11.5, and so on
Cherry trees: coplots
Given : diameter
10 12 14 16 18 20

65 70 75 80 85 65 70 75 80 85

70
60



50

40



30


20

volume

10
70
60
50
40


30





20






10

65 70 75 80 85

height
Concluding Lecture 2

Fast Normal Slow Pause/Resume

Вам также может понравиться