Вы находитесь на странице: 1из 59

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Linear Reg.

Resources

Upcoming

Exercises

UCLA Department of Statistics


Statistical Consulting Center

Introductory Statistics with R


Mine C
etinkaya
mine@stat.ucla.edu

February 1, 2010

Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Linear Reg.

Resources

Upcoming

Exercises

Outline
1

Preliminaries

Data sets

Descriptive Statistics

Probability Models

Hypothesis Testing and Confidence Intervals

Linear Regression

Online Resources for R

Upcoming Mini-Courses

Exercises

Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Preliminaries
Software Installation
R Help

Data sets

Descriptive Statistics

Probability Models

Hypothesis Testing and Confidence Intervals

Linear Regression

Online Resources for R

Upcoming Mini-Courses

Exercises

Linear Reg.

Resources

Upcoming

Exercises

Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Linear Reg.

Resources

Upcoming

Exercises

Software Installation

Installing R on a Mac

Go to
http://cran.r-project.org/

and select MacOS X


2

Select to download the


latest version: 2.11.0
(2010-04-22)

Install and Open. The R


window should look like this:

Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Linear Reg.

Resources

Upcoming

Exercises

R Help

R Help

For help with any function in R,


put a question mark before the
function name to determine what
arguments to use, examples and
background information.
1

? plot

Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Preliminaries

Data sets
Loading data into R
Viewing data sets in R

Descriptive Statistics

Probability Models

Hypothesis Testing and Confidence Intervals

Linear Regression

Online Resources for R

Upcoming Mini-Courses

Exercises

Linear Reg.

Resources

Upcoming

Exercises

Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Linear Reg.

Resources

Upcoming

Exercises

Loading data into R

Loading data into R

Loading a data set into R:


1

survey = read . table ( " http : / / www . stat . ucla .


edu / ~ mine / students _ survey _ 2008. txt " ,
header = TRUE , sep = " \ t " )

Displaying the dimensions of the data set:


1

dim ( survey )

[1] 1325

29

Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Linear Reg.

Resources

Upcoming

Exercises

Viewing data sets in R

Viewing data sets in R


Displaying the first 3 rows and 5 columns of the data set:
1

survey [1:3 ,1:5]

gender hand eyecolor glasses california


1 female left
hazel
yes
yes
2
male right
brown
no
no
3 female right
brown
yes
yes

Displaying the variable names in the data set:


1

names ( survey )

[1] "gender"
[6] "birthmonth"
[11] "graduate"
...

"hand"
"birthday"
"oncampus"

"eyecolor"
"birthyear"
"time"

"glasses"
"california"
"ageinmonths" "height"
"walk"
"hsclass"

Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Linear Reg.

Resources

Upcoming

Exercises

Viewing data sets in R

Attaching / detaching data frames in R


Attaching the variables in a data set::
1

attach ( survey )

The following object(s) are masked from package:datasets :


sleep

The warning is telling us that we have attached a data frame


that contains a column, whose name is sleep. If you type:
1

sleep

the object with that name in the data frame will be seen
before another object with the same name that is lower in the
search() path. Thus, your object is masking the other.
To detach a data frame, i.e. remove from the search() path
of available R objects - but we wont do that now.
1

detach ( sleep )

Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Preliminaries

Data sets

Descriptive Statistics
Variable classes
Displaying categorical data
Displaying quantitative data
Describing distributions numerically

Probability Models

Hypothesis Testing and Confidence Intervals

Linear Regression

Online Resources for R

Upcoming Mini-Courses

Exercises

Linear Reg.

Resources

Upcoming

Exercises

Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Linear Reg.

Resources

Upcoming

Exercises

Variable classes

Displaying the class of a variable:


1

class ( instructor )

[1] "factor"

Changing the class of a variable:


1
2

instructor = as . character ( instructor )


class ( instructor )

[1] "character"

Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Linear Reg.

Resources

Upcoming

Exercises

Displaying categorical data

Tables

Tables are useful for displaying the distribution of categorical


variables.
1

table ( gender )

gender
female
882

male
443

Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Linear Reg.

Resources

Upcoming

Exercises

Displaying categorical data

Contingency tables

Contingency tables display two categorical variables at a time.


1

table ( gender , hand )

hand
gender
ambidextrous left right
female
9
67
806
male
11
45
387

Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Linear Reg.

Resources

Upcoming

Exercises

Displaying categorical data

Frequency bar plots


Display counts of each category next to each other for easy
comparison.
1

barplot ( table ( gender ) , main = " Barplot of


Gender " )

0 200

600

Barplot of Gender

female

male

Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Linear Reg.

Resources

Upcoming

Exercises

Displaying categorical data

Relative frequency bar plots


Display relative proportions of each category.
1

barplot ( table ( gender ) / length ( gender ) , main = "


Relative Frequency \ n Barplot of Gender " )

0.0

0.3

0.6

Relative Frequency
Barplot of Gender

female

male

Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Linear Reg.

Resources

Upcoming

Exercises

Displaying categorical data

Segmented bar charts


Displays two categorical variables at a time.
1

barplot ( table ( gender , hand ) , col = c ( " skyblue "


, " blue " ) , main = " Segmented Bar Plot \ n
of Gender " )
legend ( " topleft " , c ( " females " ," males " ) , col =
c ( " skyblue " , " blue " ) , pch = 16 , inset =
0.05)
Segmented Bar Plot
of Gender

200 400 600 800

females
males

ambidextrous

left

right

Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Linear Reg.

Resources

Upcoming

Exercises

Displaying categorical data

Pie charts
Pie charts display counts as percentages of individuals in each
category.
1

pct = round ( table ( gender ) / length ( gender ) *


100)
lbls = paste ( names ( table ( gender ) ) , " \ n " , " % " ,
pct )
pie ( table ( gender ) , labels = lbls )
female
% 67

male
% 33
Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Linear Reg.

Resources

Upcoming

Exercises

Displaying quantitative data

Histograms
Display the number of cases in each bin
1

hist ( ageinmonths , main = " Histogram of Age in


Months " )

300
200
0

100

Frequency

400

Histogram of Age in Months

200

250

300

350

ageinmonths

Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Linear Reg.

Resources

Upcoming

Exercises

Displaying quantitative data

Relative frequency histograms


Display the proportion of of cases in each bin.
1

hist ( ageinmonths , freq = FALSE , main = "


Relative Frequency \ n Histogram of Age in
Months " , xlab = " Age in Months " )

0.020
0.000

0.010

Density

0.030

Relative Frequency
Histogram of Age in Months

200

250

300

350

Age in Months

Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Linear Reg.

Resources

Upcoming

Exercises

Displaying quantitative data

Stem-and-Leaf Plots
Preserve individual data values.
1

stem ( ageinmonths )

The decimal point is 1 digit(s) to the right of the |


20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35

|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|

48
004444555566666666666666666777777777778888888888889999999999999999
00000000000000000000000000111111111111111122222222222222222222333333+258
00000000000000000000000000000000000000000000001111111111111111111111+379
00000000000000000000000000000000000000000000111111111111111111111111+170
00000000000001111111111111112222222222222222222223333333344444444445+24
000000000001111111111222222333334444444444556666778889
00111222222344566789
01334558888
0004569
267
02257
44
5
89
3

Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Linear Reg.

Resources

Upcoming

Exercises

Displaying quantitative data

Boxplots
1

boxplot ( ageinmonths , main = " Boxplot of Age in


Months " )

200

250

300

350

Boxplot of Age in Months

Five Number Summary (Min, Q1, Median, Q3, Max):


1

fivenum ( ageinmonths )

[1] 204 228 235 243 353

Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Linear Reg.

Resources

Upcoming

Exercises

Describing distributions numerically

Summary
Categorical variables:
1

summary ( hand )

ambidextrous
20

left
112

right
1193

Quantitative variables:
1

summary ( ageinmonths )

Min. 1st Qu.


204.0
228.0

Median
235.0

Mean 3rd Qu.


237.8
243.0

Max.
353.0

Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Linear Reg.

Resources

Upcoming

Exercises

Describing distributions numerically

Measures of center
Mean (arithmetic average):
1

mean ( ageinmonths )

[1] 237.8309

Median (value that divides the histogram into two equal


areas):
1

median ( ageinmonths )

[1] 235

Mode (the most frequent value): for discrete data


1

as . numeric ( names ( sort ( table ( ageinmonths ) ,


decreasing = TRUE ) ) [1])

[1] 228
Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Linear Reg.

Resources

Upcoming

Exercises

Describing distributions numerically

Mode (alternative)

To find the mode, you may also use the Mode function in the
prettyR package.
1
2
3

install . packages ( " prettyR " )


library ( prettyR )
Mode ( ageinmonths )

[1] "228"

Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Linear Reg.

Resources

Upcoming

Exercises

Describing distributions numerically

Adding measures to plots


Adding mean and median to a histogram.
1

2
3
4

hist ( ageinmonths , main = " Histogram of Age in


Months " )
abline ( v = mean ( ageinmonths ) , col = " blue " )
abline ( v = median ( ageinmonths ) , col = " green " )
legend ( " topright " , c ( " Mean " , " Median " ) , pch =
16 , col = c ( " blue " , " green " ) )
Histogram of Age in Months

300
200
0

100

Frequency

400

Mean
Median

200

Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

250

300

350

ageinmonths

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Linear Reg.

Resources

Upcoming

Exercises

Describing distributions numerically

Measures of spread
Range (Min, Max):
1

range ( ageinmonths )

[1] 204 353

IQR:
1

IQR ( ageinmonths )

[1] 15

Standard deviation:
1

sd ( ageinmonths )

[1] 16.03965

Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Preliminaries

Data sets

Descriptive Statistics

Probability Models
Geometric
Binomial
Poisson
Normal

Hypothesis Testing and Confidence Intervals

Linear Regression

Online Resources for R

Upcoming Mini-Courses

Exercises

Linear Reg.

Resources

Upcoming

Exercises

Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Linear Reg.

Resources

Upcoming

Exercises

Geometric

Geometric distribution
If the probability of success is 0.35, what is the probability that the
first success will be on the 5th trial?
1

dgeom (4 ,0.35)

[1] 0.06247719
Note: dgeom gives the density (or probability mass function for discrete
variables), pgeom gives the distribution function, qgeom gives the
quantile function, and rgeom generates random deviates. This is true for
the functions used for Binomial, Poisson and Normal calculations as well.

Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Linear Reg.

Resources

Upcoming

Exercises

Binomial

Binomial distribution
If the probability of success is 0.35, what is the probability of
3 successes in 5 trials?
1

dbinom (3 ,5 ,0.35)

[1] 0.1811469

at least 3 successes in 5 trials?


1

sum ( dbinom (3:5 ,5 ,0.35) )

[1] 0.2351694

Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Linear Reg.

Resources

Upcoming

Exercises

Poisson

Poisson distribution
The number of traffic accidents per week in a small city has
Poisson distribution with mean equal to 3. What is the probability
of
two accidents in a week?
1

dpois (2 ,3)

[1] 0.2240418

at most one accident in a week?


1

sum ( dpois (0:1 ,3) )

[1] 0.1991483

Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Linear Reg.

Resources

Upcoming

Exercises

Normal

Normal distribution
Scores on an exam are distributed normally with a mean of 65 and
a standard deviation of 12. What percentage of the students have
scores
below 50?
1

pnorm (50 ,65 ,12)

[1] 0.1056498

between 50 and 70?


1

pnorm (70 ,65 ,12) - pnorm (50 ,65 ,12)

[1] 0.5558891

Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Linear Reg.

Resources

Upcoming

Exercises

Normal

Normal distribution (cont.)

What is the 90th percentile of the score distribution?


1

qnorm (.90 ,65 ,12)

[1] 80.37862

Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Preliminaries

Data sets

Descriptive Statistics

Probability Models

Hypothesis Testing and Confidence Intervals


One sample means
Two sample means
One sample proportions
Two sample proportions

Linear Regression

Online Resources for R

Upcoming Mini-Courses

Exercises

Linear Reg.

Resources

Upcoming

Exercises

Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Linear Reg.

Resources

Upcoming

Exercises

One sample means

Hypothesis testing for one sample means


Is there evidence to suggest that the average age in months for
Stats 10 students is more than 235 months? Use = 0.05.
1

2
3

sample100 = sample (1:1325 , 100 , replace =


FALSE )
survey . sub = survey [ sample100 ,]
t . test ( survey . sub $ ageinmonths , alternative = "
greater " , mu = 235 , conf . level = 0.95)

One Sample t-test


data: survey.sub$ageinmonths
t = 1.5922, df = 99, p-value = 0.05726
alternative hypothesis: true mean is greater than 235
95 percent confidence interval:
234.9118
Inf
sample estimates:
mean of x
237.06

Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Linear Reg.

Resources

Upcoming

Exercises

One sample means

Confidence intervals for one sample means


The t.test function prints out a confidence interval as well.
However this function returns a one-sided interval when the
alternative is "greater" or "less".
When alternative = "greater" is chosen the lower
confidence bound is calculated and the upper bound is given
as Inf by default.
When alternative = "less" is chosen the upper
confidence bound is calculated and the lower bound is given
as -Inf by default.
When alternative = "two.sided" is chosen both the
upper and the lower confidence bounds are calculated.

Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Linear Reg.

Resources

Upcoming

Exercises

One sample means

Confidence intervals for one sample means (cont.)


1

t . test ( survey . sub $ ageinmonths , alternative = "


two . sided " , mu = 235 , conf . level = 0.90)

One Sample t-test


data: survey.sub$ageinmonths
t = 1.5922, df = 99, p-value = 0.1145
alternative hypothesis: true mean is not equal to 235
90 percent confidence interval:
234.9118 239.2082
sample estimates:
mean of x
237.06

Note that we changed the confidence level to 0.90 in order to


correspond to a one-sided hypothesis test with = 0.05.

Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Linear Reg.

Resources

Upcoming

Exercises

One sample means

Confidence intervals for one sample means (cont.)


Alternative calculation of confidence interval:
1
2

3
4
5
6
7
8
9

onesample . mean . ci = function (x , conf . level ) {


tstar = - qt ( p = ((1 - conf . level ) / 2) , df = (
length ( x ) - 1) )
xbar = mean ( x )
sexbar = sd ( x ) / sqrt ( length ( x ) )
cilower = xbar - tstar * sexbar
ciupper = xbar + tstar * sexbar
return ( list = c ( cilower , ciupper ) )
}
onesample . mean . ci ( survey . sub $ ageinmonths ,
0.90)

[1] 234.9118 239.2082


Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Linear Reg.

Resources

Upcoming

Exercises

Two sample means

Hypothesis testing and CI for two sample means


Is there a difference between the ages of females and males?
Construct a 95% confidence interval for the difference between the
average ages of females and males.
1

t . test ( survey . sub $ ageinmonths [ survey .


sub $ gender == " female " ] , survey .
sub $ ageinmonths [ survey . sub $ gender == " male
" ] , alternative = " two . sided " , conf . level
= 0.95)

Welch Two Sample t-test


data: survey.sub$ageinmonths[survey.sub$gender == "female"] and
survey.sub$ageinmonths[survey.sub$gender == "male"]
t = 1.25, df = 95.736, p-value = 0.2143
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.765100 7.768572
sample estimates:
mean of x mean of y
238.1406 235.1389

Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Linear Reg.

Resources

Upcoming

Exercises

One sample proportions

Hypothesis testing for one sample proportions


64 out of 100 students in a random sample are females. Is there
evidence to suggest that the population proportion of females is
less than 65%? Use a 90% confidence level.
1

prop . test (64 , 100 , p = 0.65 , alternative = "


less " , conf . level = 0.90)

1-sample proportions test with continuity correction


data: 64 out of 100, null probability 0.65
X-squared = 0.011, df = 1, p-value = 0.4583
alternative hypothesis: true p is less than 0.65
90 percent confidence interval:
0.0000000 0.7035286
sample estimates:
p
0.64

Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Linear Reg.

Resources

Upcoming

Exercises

One sample proportions

Confidence intervals for one sample proportions


Just like the t.test, the prop.test function will calculate both
the upper and the lower bounds of the confidence interval only
when alternative = "two.sided" is chosen. Otherwise a lower
bound of 0 or an upper bound of 1 is produced.
1

prop . test (64 , 100 , p = 0.65 , alternative = "


two . sided " , conf . level = 0.80)

1-sample proportions test with continuity correction


data: 64 out of 100, null probability 0.65
X-squared = 0.011, df = 1, p-value = 0.9165
alternative hypothesis: true p is not equal to 0.65
80 percent confidence interval:
0.5715825 0.7035286
sample estimates:
p
0.64

Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Linear Reg.

Resources

Upcoming

Exercises

Two sample proportions

Hypothesis testing and CI for two sample proportions


54 out of 64 females and 32 out of 36 males are right handed. Is
there evidence to suggest that proportions of males and females
who are right handed are different?
1

prop . test ( c (54 ,32) , c (64 ,36) )

2-sample test for equality of proportions with continuity correction


data: c(54, 32) out of c(64, 36)
X-squared = 0.1051, df = 1, p-value = 0.7458
alternative hypothesis: two.sided
95 percent confidence interval:
-0.2026789 0.1124012
sample estimates:
prop 1
prop 2
0.8437500 0.8888889

Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Preliminaries

Data sets

Descriptive Statistics

Probability Models

Hypothesis Testing and Confidence Intervals

Linear Regression
Scatterplots, Association, and Correlation
Simple Linear Regression

Online Resources for R

Upcoming Mini-Courses

Exercises

Linear Reg.

Resources

Upcoming

Exercises

Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Linear Reg.

Resources

Upcoming

Exercises

Scatterplots, Association, and Correlation

Scatterplots
Is there an association between amount of alcohol consumed and
maximum speed?
1

plot ( speed ~ alcohol , main = " Scatterplot of


Speed vs . Alcohol " , pch = 20 , cex = 0.5)

100
0

50

speed

150

Scatterplot of Speed vs. Alcohol

20

40

60

80

alcohol
Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Linear Reg.

Resources

Upcoming

Exercises

Scatterplots, Association, and Correlation

Correlation

cor ( alcohol , speed , use = " pairwise . complete .


obs " )

[1] 0.2309745

Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Linear Reg.

Resources

Upcoming

Exercises

Simple Linear Regression

Simple Linear Regression


Build a linear regression model predicting speed from alcohol.
1

summary ( lm ( speed ~ alcohol ) )

Call:
lm(formula = speed ~ alcohol)
Residuals:
Min
1Q
-90.769 -8.725

Median
1.275

3Q
11.275

Max
91.541

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 88.7248
0.6511 136.261
<2e-16 ***
alcohol
0.9469
0.1108
8.549
<2e-16 ***
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 21.83 on 1297 degrees of freedom
(26 observations deleted due to missingness)
Multiple R-squared: 0.05335, Adjusted R-squared: 0.05262
F-statistic: 73.09 on 1 and 1297 DF, p-value: < 2.2e-16

Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Preliminaries

Data sets

Descriptive Statistics

Probability Models

Hypothesis Testing and Confidence Intervals

Linear Regression

Online Resources for R

Upcoming Mini-Courses

Exercises

Linear Reg.

Resources

Upcoming

Exercises

Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Linear Reg.

Resources

Upcoming

Exercises

Online Resources for R

Download R: http:// cran.stat.ucla.edu/

Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Linear Reg.

Resources

Upcoming

Exercises

Online Resources for R

Download R: http:// cran.stat.ucla.edu/


Search Engine for R: rseek.org

Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Linear Reg.

Resources

Upcoming

Exercises

Online Resources for R

Download R: http:// cran.stat.ucla.edu/


Search Engine for R: rseek.org
R Reference Card:
http:// cran.r-project.org/ doc/ contrib/ Short-refcard.pdf

Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Linear Reg.

Resources

Upcoming

Exercises

Online Resources for R

Download R: http:// cran.stat.ucla.edu/


Search Engine for R: rseek.org
R Reference Card:
http:// cran.r-project.org/ doc/ contrib/ Short-refcard.pdf

UCLA Statistics Information Portal: http:// info.stat.ucla.edu/ grad/

Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Linear Reg.

Resources

Upcoming

Exercises

Online Resources for R

Download R: http:// cran.stat.ucla.edu/


Search Engine for R: rseek.org
R Reference Card:
http:// cran.r-project.org/ doc/ contrib/ Short-refcard.pdf

UCLA Statistics Information Portal: http:// info.stat.ucla.edu/ grad/


UCLA Statistical Consulting Center: http:// scc.stat.ucla.edu

Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Preliminaries

Data sets

Descriptive Statistics

Probability Models

Hypothesis Testing and Confidence Intervals

Linear Regression

Online Resources for R

Upcoming Mini-Courses

Exercises

Linear Reg.

Resources

Upcoming

Exercises

Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Linear Reg.

Resources

Upcoming

Exercises

Upcoming Mini-Courses

May 5, R Stats II: Linear Regression


May 10, R Stats III: Nonlinear Regression
May 12, LaTeX V: Creating Vector Graphics in LaTeX
For a schedule of all mini-courses offered please visit
http:// scc.stat.ucla.edu/ mini-courses .

Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Linear Reg.

Resources

Upcoming

Exercises

Thank you
Any questions?

Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Preliminaries

Data sets

Descriptive Statistics

Probability Models

Hypothesis Testing and Confidence Intervals

Linear Regression

Online Resources for R

Upcoming Mini-Courses

Exercises

Linear Reg.

Resources

Upcoming

Exercises

Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Linear Reg.

Resources

Upcoming

Exercises

Exercises
1

Construct side-by-side box plots for the distribution of amount


of time it takes students to get to class (time) by their means
of transportation (walk).

Usually younger students live on campus and older students


live off campus. Is there evidence to suggest this trend in this
data set? (Use a random sample of 100 students and
= 0.05.)

Calculate a 90% confidence interval for the difference between


the average ages of students who live on campus and off
campus.

Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Linear Reg.

Resources

Upcoming

Exercises

Solution to Exercise 1
1

boxplot ( time ~ walk , main = " Time to get to


class \ n by type of transportation " )

50

Minutes

100

150

Time to get to class


by type of transportation

bicycle

bus

car (by yourself) carpool

motorcycle

other

segway

skateboard

walk

Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Linear Reg.

Resources

Upcoming

Exercises

Solution to Exercise 2
1

t . test ( survey . sub $ ageinmonths [ survey .


sub $ oncampus == " yes " ] , survey .
sub $ ageinmonths [ survey . sub $ oncampus == " no
" ] , alternative = " less " , conf . level =
0.95)

Welch Two Sample t-test


data: survey.sub$ageinmonths[survey.sub$oncampus == "yes"] and
survey.sub$ageinmonths[survey.sub$oncampus == "no"]
t = -5.3322, df = 34.867, p-value = 2.964e-06
alternative hypothesis: true difference in means is less than 0
95 percent confidence interval:
-Inf -10.85376
sample estimates:
mean of x mean of y
232.6111 248.5000

Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Prelim.

Data

Descriptive Statistics

Prob. Models

Hyp. Test & CI

Linear Reg.

Resources

Upcoming

Exercises

Solution to Exercise 3
1

t . test ( survey . sub $ ageinmonths [ survey .


sub $ oncampus == " yes " ] , survey .
sub $ ageinmonths [ survey . sub $ oncampus == " no
" ] , alternative = " two . sided " , conf . level
= 0.90)

Welch Two Sample t-test


data: survey.sub$ageinmonths[survey.sub$oncampus == "yes"] and
survey.sub$ageinmonths[survey.sub$oncampus == "no"]
t = -5.3322, df = 34.867, p-value = 5.929e-06
alternative hypothesis: true difference in means is not equal to 0
90 percent confidence interval:
-20.92402 -10.85376
sample estimates:
mean of x mean of y
232.6111 248.5000

Mine C
etinkaya mine@stat.ucla.edu
Introductory Statistics with R

UCLA SCC

Вам также может понравиться