Вы находитесь на странице: 1из 57

Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg.

Resources Upcoming Exercises

UCLA Department of Statistics


Statistical Consulting Center

Introductory Statistics with R

Mine Çetinkaya
mine@stat.ucla.edu

April 2, 2009

Mine Çetinkaya mine@stat.ucla.edu


Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Outline
1 Preliminaries

2 Data sets

3 Descriptive Statistics

4 Probability Models

5 Hypothesis Testing and Confidence Intervals

6 Linear Regression

7 Online Resources for R

8 Upcoming Mini-Courses

9 Exercises
Mine Çetinkaya mine@stat.ucla.edu
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

1 Preliminaries
Software Installation
R Help

2 Data sets

3 Descriptive Statistics

4 Probability Models

5 Hypothesis Testing and Confidence Intervals

6 Linear Regression

7 Online Resources for R

8 Upcoming Mini-Courses

9 Exercises
Mine Çetinkaya mine@stat.ucla.edu
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Software Installation

Installing R on a Mac

1 Go to
http://cran.r-project.org/
and select MacOS X
2 Select to download the
latest version: 2.8.1
(2008-12-22)
3 Install and Open. The R
window should look like this:

Mine Çetinkaya mine@stat.ucla.edu


Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

R Help

R Help

For help with any function in R,


put a question mark before the
function name to determine what
arguments to use, examples and
background information.
1 ? plot

Mine Çetinkaya mine@stat.ucla.edu


Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

1 Preliminaries

2 Data sets
Loading data into R
Viewing data sets in R

3 Descriptive Statistics

4 Probability Models

5 Hypothesis Testing and Confidence Intervals

6 Linear Regression

7 Online Resources for R

8 Upcoming Mini-Courses

9 Exercises
Mine Çetinkaya mine@stat.ucla.edu
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Loading data into R

Loading data into R

Loading a data set into R:


1 survey = read . table ( " http : / / www . stat . ucla .
edu / ~ mine / students _ survey _ 2008. txt " ,
header = TRUE , sep = " \ t " )

Displaying the dimensions of the data set:


1 dim ( survey )

[1] 1325 29

Mine Çetinkaya mine@stat.ucla.edu


Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Viewing data sets in R

Viewing data sets in R


Displaying the first 3 rows and 5 columns of the data set:
1 survey [1:3 ,1:5]
gender hand eyecolor glasses california
1 female left hazel yes yes
2 male right brown no no
3 female right brown yes yes

Displaying the variable names in the data set:


1 names ( survey )

[1] "gender" "hand" "eyecolor" "glasses" "california"


[6] "birthmonth" "birthday" "birthyear" "ageinmonths" "height"
[11] "graduate" "oncampus" "time" "walk" "hsclass"
...

Attaching the variables in a data set:


1 attach ( survey )

Mine Çetinkaya mine@stat.ucla.edu


Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

1 Preliminaries

2 Data sets

3 Descriptive Statistics
Variable classes
Displaying categorical data
Displaying quantitative data
Describing distributions numerically

4 Probability Models

5 Hypothesis Testing and Confidence Intervals

6 Linear Regression

7 Online Resources for R

8 Upcoming Mini-Courses

9 Exercises
Mine Çetinkaya mine@stat.ucla.edu
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Variable classes

Displaying the class of a variable:


1 class ( gender )

[1] "integer"

Changing the class of a variable:


1 gender = as . factor ( gender )
2 class ( gender )

[1] "factor"

Mine Çetinkaya mine@stat.ucla.edu


Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Displaying categorical data

Tables

Tables are useful for displaying the distribution of categorical


variables.
1 table ( gender )

gender
female male
882 443

Mine Çetinkaya mine@stat.ucla.edu


Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Displaying categorical data

Contingency tables

Contingency tables display two categorical variables at a time.


1 table ( gender , hand )

hand
gender ambidextrous left right
female 9 67 806
male 11 45 387

Mine Çetinkaya mine@stat.ucla.edu


Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Displaying categorical data

Frequency bar plots


Display counts of each category next to each other for easy
comparison.
1 barplot ( table ( gender ) , main = " Barplot of
Gender " )

Barplot of Gender
600
0 200

female male

Mine Çetinkaya mine@stat.ucla.edu


Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Displaying categorical data

Relative frequency bar plots


Display relative proportions of each category.
1 barplot ( table ( gender ) / length ( gender ) , main = "
Relative Frequency \ n Barplot of Gender " )

Relative Frequency
Barplot of Gender
0.6
0.3
0.0

female male

Mine Çetinkaya mine@stat.ucla.edu


Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Displaying categorical data

Segmented bar charts


Displays two categorical variables at a time.
1 barplot ( table ( gender , hand ) , col = c ( " skyblue "
, " blue " ) , main = " Segmented Bar Plot \ n
of Gender " )
2 legend ( " topleft " , c ( " females " ," males " ) , col =
c ( " skyblue " , " blue " ) , pch = 16 , inset =
0.05)

Segmented Bar Plot


of Gender

females
males
200 400 600 800
0

ambidextrous left right

Mine Çetinkaya mine@stat.ucla.edu


Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Displaying categorical data

Pie charts
Pie charts display counts as percentages of individuals in each
category.
1 pct = round ( table ( gender ) / length ( gender ) *
100)
2 lbls = paste ( names ( table ( gender ) ) , " \ n " , " % " ,
pct )
3 pie ( table ( gender ) , labels = lbls )

female
% 67

male
% 33

Mine Çetinkaya mine@stat.ucla.edu


Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Displaying quantitative data

Histograms
Display the number of cases in each bin
1 hist ( ageinmonths , main = " Histogram of Age in
Months " )

Histogram of Age in Months


400
300
Frequency

200
100
0

200 250 300 350

ageinmonths

Mine Çetinkaya mine@stat.ucla.edu


Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Displaying quantitative data

Relative frequency histograms


Display the proportion of of cases in each bin.
1 hist ( ageinmonths , freq = FALSE , main = "
Relative Frequency \ n Histogram of Age in
Months " , xlab = " Age in Months " )

Relative Frequency
Histogram of Age in Months
0.030
0.020
Density

0.010
0.000

200 250 300 350

Age in Months

Mine Çetinkaya mine@stat.ucla.edu


Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Displaying quantitative data

Stem-and-Leaf Plots
Preserve individual data values.
1 stem ( ageinmonths )

The decimal point is 1 digit(s) to the right of the |

20 | 48
21 | 004444555566666666666666666777777777778888888888889999999999999999
22 | 00000000000000000000000000111111111111111122222222222222222222333333+258
23 | 00000000000000000000000000000000000000000000001111111111111111111111+379
24 | 00000000000000000000000000000000000000000000111111111111111111111111+170
25 | 00000000000001111111111111112222222222222222222223333333344444444445+24
26 | 000000000001111111111222222333334444444444556666778889
27 | 00111222222344566789
28 | 01334558888
29 | 0004569
30 | 267
31 | 02257
32 | 44
33 | 5
34 | 89
35 | 3

Mine Çetinkaya mine@stat.ucla.edu


Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Displaying quantitative data

Boxplots
1 boxplot ( ageinmonths , main = " Boxplot of Age in
Months " )

Boxplot of Age in Months

350
300
250
200

Five Number Summary (Min, Q1, Median, Q3, Max):


1 fivenum ( ageinmonths )
[1] 204 228 235 243 353

Mine Çetinkaya mine@stat.ucla.edu


Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Describing distributions numerically

Summary

Categorical variables:
1 summary ( hand )

ambidextrous left right


20 112 1193

Quantitative variables:
1 summary ( ageinmonths )

Min. 1st Qu. Median Mean 3rd Qu. Max.


204.0 228.0 235.0 237.8 243.0 353.0

Mine Çetinkaya mine@stat.ucla.edu


Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Describing distributions numerically

Measures of center
Mean (arithmetic average):
1 mean ( ageinmonths )

[1] 237.8309
Median (value that divides the histogram into two equal
areas):
1 median ( ageinmonths )

[1] 235
Mode (the most frequent value):
1 as . numeric ( names ( sort ( - table ( ageinmonths ) )
) [1])

[1] 228
Mine Çetinkaya mine@stat.ucla.edu
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Describing distributions numerically

Adding measures to plots


Adding mean and median to a histogram.
1 hist ( ageinmonths , main = " Histogram of Age in
Months " )
2 abline ( v = mean ( ageinmonths ) , col = " blue " )
3 abline ( v = median ( ageinmonths ) , col = " green " )
4 legend ( " topright " , c ( " Mean " , " Median " ) , pch =
16 , col = c ( " blue " , " green " ) )

Histogram of Age in Months

Mean
400

Median
300
Frequency

200
100
0

200 250 300 350

ageinmonths
Mine Çetinkaya mine@stat.ucla.edu
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Describing distributions numerically

Measures of spread
Range (Min, Max):
1 range ( ageinmonths )

[1] 204 353

IQR:
1 IQR ( ageinmonths )

[1] 15

Standard deviation:
1 sd ( ageinmonths )

[1] 16.03965

Mine Çetinkaya mine@stat.ucla.edu


Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

1 Preliminaries

2 Data sets

3 Descriptive Statistics

4 Probability Models
Geometric
Binomial
Poisson
Normal

5 Hypothesis Testing and Confidence Intervals

6 Linear Regression

7 Online Resources for R

8 Upcoming Mini-Courses

9 Exercises
Mine Çetinkaya mine@stat.ucla.edu
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Geometric

Geometric distribution

If the probability of success is 0.35, what is the probability that the


first success will be on the 5th trial?
1 dgeom (4 ,0.35)

[1] 0.06247719

Note: dgeom gives the density (or probability mass function for discrete
variables), pgeom gives the distribution function, qgeom gives the
quantile function, and rgeom generates random deviates. This is true for
the functions used for Binomial, Poisson and Normal calculations as well.

Mine Çetinkaya mine@stat.ucla.edu


Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Binomial

Binomial distribution

If the probability of success is 0.35, what is the probability of


3 successes in 5 trials?
1 dbinom (3 ,5 ,0.35)

[1] 0.1811469

at least 3 successes in 5 trials?


1 sum ( dbinom (3:5 ,5 ,0.35) )

[1] 0.2351694

Mine Çetinkaya mine@stat.ucla.edu


Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Poisson

Poisson distribution
The number of traffic accidents per week in a small city has
Poisson distribution with mean equal to 3. What is the probability
of
two accidents in a week?
1 dpois (2 ,3)

[1] 0.2240418

at most one accident in a week?


1 sum ( dpois (0:1 ,3) )

[1] 0.1991483

Mine Çetinkaya mine@stat.ucla.edu


Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Normal

Normal distribution
Scores on an exam are distributed normally with a mean of 65 and
a standard deviation of 12. What percentage of the students have
scores
below 50?
1 pnorm (50 ,65 ,12)

[1] 0.1056498

between 50 and 70?


1 pnorm (70 ,65 ,12) - pnorm (50 ,65 ,12)

[1] 0.5558891

Mine Çetinkaya mine@stat.ucla.edu


Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Normal

Normal distribution (cont.)

What is the 90th percentile of the score distribution?


1 qnorm (.90 ,65 ,12)

[1] 80.37862

Mine Çetinkaya mine@stat.ucla.edu


Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

1 Preliminaries

2 Data sets

3 Descriptive Statistics

4 Probability Models

5 Hypothesis Testing and Confidence Intervals


One sample means
Two sample means
One sample proportions
Two sample proportions

6 Linear Regression

7 Online Resources for R

8 Upcoming Mini-Courses

9 Exercises
Mine Çetinkaya mine@stat.ucla.edu
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

One sample means

Hypothesis testing for one sample means


Is there evidence to suggest that the average age in months for
Stats 10 students is more than 235 months? Use α = 0.05.
1 sample100 = sample (1:1325 , 100 , replace =
FALSE )
2 survey . sub = survey [ sample100 ,]
3 t . test ( survey . sub $ ageinmonths , alternative = "
greater " , mu = 235 , conf . level = 0.95)

One Sample t-test

data: survey.sub$ageinmonths
t = 1.5922, df = 99, p-value = 0.05726
alternative hypothesis: true mean is greater than 235
95 percent confidence interval:
234.9118 Inf
sample estimates:
mean of x
237.06

Mine Çetinkaya mine@stat.ucla.edu


Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

One sample means

Confidence intervals for one sample means

The t.test function prints out a confidence interval as well.


However this function returns a one-sided interval when the
alternative is "greater" or "less".
When alternative = "greater" is chosen the lower
confidence bound is calculated and the upper bound is given
as Inf by default.
When alternative = "less" is chosen the upper
confidence bound is calculated and the lower bound is given
as -Inf by default.
When alternative = "two.sided" is chosen both the
upper and the lower confidence bounds are calculated.

Mine Çetinkaya mine@stat.ucla.edu


Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

One sample means

Confidence intervals for one sample means (cont.)

1 t . test ( survey . sub $ ageinmonths , alternative = "


two . sided " , mu = 235 , conf . level = 0.90)

One Sample t-test

data: survey.sub$ageinmonths
t = 1.5922, df = 99, p-value = 0.1145
alternative hypothesis: true mean is not equal to 235
90 percent confidence interval:
234.9118 239.2082
sample estimates:
mean of x
237.06

Note that we changed the confidence level to 0.90 in order to


correspond to a one-sided hypothesis test with α = 0.05.

Mine Çetinkaya mine@stat.ucla.edu


Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

One sample means

Confidence intervals for one sample means (cont.)


Alternative calculation of confidence interval:
1 onesample . mean . ci = function (x , conf . level ) {
2 tstar = - qt ( p = ((1 - conf . level ) / 2) , df = (
length ( x ) - 1) )
3 xbar = mean ( x )
4 sexbar = sd ( x ) / sqrt ( length ( x ) )
5 cilower = xbar - tstar * sexbar
6 ciupper = xbar + tstar * sexbar
7 return ( list = c ( cilower , ciupper ) )
8 }
9 onesample . mean . ci ( survey . sub $ ageinmonths ,
0.90)

[1] 234.9118 239.2082

Mine Çetinkaya mine@stat.ucla.edu


Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Two sample means

Hypothesis testing and CI for two sample means


Is there a difference between the ages of females and males?
Construct a 95% confidence interval for the difference between the
average ages of females and males.
1 t . test ( survey . sub $ ageinmonths [ survey .
sub $ gender == " female " ] , survey .
sub $ ageinmonths [ survey . sub $ gender == " male
" ] , alternative = " two . sided " , conf . level
= 0.95)
Welch Two Sample t-test

data: survey.sub$ageinmonths[survey.sub$gender == "female"] and


survey.sub$ageinmonths[survey.sub$gender == "male"]
t = 1.25, df = 95.736, p-value = 0.2143
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.765100 7.768572
sample estimates:
mean of x mean of y
238.1406 235.1389

Mine Çetinkaya mine@stat.ucla.edu


Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

One sample proportions

Hypothesis testing for one sample proportions

64 out of 100 students in a random sample are females. Is there


evidence to suggest that the population proportion of females is
less than 65%? Use a 90% confidence level.
1 prop . test (64 , 100 , p = 0.65 , alternative = "
less " , conf . level = 0.90)

1-sample proportions test with continuity correction

data: 64 out of 100, null probability 0.65


X-squared = 0.011, df = 1, p-value = 0.4583
alternative hypothesis: true p is less than 0.65
90 percent confidence interval:
0.0000000 0.7035286
sample estimates:
p
0.64

Mine Çetinkaya mine@stat.ucla.edu


Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

One sample proportions

Confidence intervals for one sample proportions


Just like the t.test, the prop.test function will calculate both
the upper and the lower bounds of the confidence interval only
when alternative = "two.sided" is chosen. Otherwise a lower
bound of 0 or an upper bound of 1 is produced.
1 prop . test (64 , 100 , p = 0.65 , alternative = "
two . sided " , conf . level = 0.80)

1-sample proportions test with continuity correction

data: 64 out of 100, null probability 0.65


X-squared = 0.011, df = 1, p-value = 0.9165
alternative hypothesis: true p is not equal to 0.65
80 percent confidence interval:
0.5715825 0.7035286
sample estimates:
p
0.64

Mine Çetinkaya mine@stat.ucla.edu


Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Two sample proportions

Hypothesis testing and CI for two sample proportions

54 out of 64 females and 32 out of 36 males are right handed. Is


there evidence to suggest that proportions of males and females
who are right handed are different?
1 prop . test ( c (54 ,32) , c (64 ,36) )

2-sample test for equality of proportions with continuity correction

data: c(54, 32) out of c(64, 36)


X-squared = 0.1051, df = 1, p-value = 0.7458
alternative hypothesis: two.sided
95 percent confidence interval:
-0.2026789 0.1124012
sample estimates:
prop 1 prop 2
0.8437500 0.8888889

Mine Çetinkaya mine@stat.ucla.edu


Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

1 Preliminaries

2 Data sets

3 Descriptive Statistics

4 Probability Models

5 Hypothesis Testing and Confidence Intervals

6 Linear Regression
Scatterplots, Association, and Correlation
Simple Linear Regression

7 Online Resources for R

8 Upcoming Mini-Courses

9 Exercises
Mine Çetinkaya mine@stat.ucla.edu
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Scatterplots, Association, and Correlation

Scatterplots
Is there an association between amount of alcohol consumed and
maximum speed?
1 plot ( speed ~ alcohol , main = " Scatterplot of
Speed vs . Alcohol " , pch = 20 , cex = 0.5)

Scatterplot of Speed vs. Alcohol


150
speed

100
50
0

0 20 40 60 80

alcohol

Mine Çetinkaya mine@stat.ucla.edu


Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Scatterplots, Association, and Correlation

Correlation

1 cor ( alcohol , speed , use = " pairwise . complete .


obs " )

[1] 0.2309745

Mine Çetinkaya mine@stat.ucla.edu


Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Simple Linear Regression

Simple Linear Regression


Build a linear regression model predicting speed from alcohol.
1 summary ( lm ( speed ~ alcohol ) )

Call:
lm(formula = speed ~ alcohol)

Residuals:
Min 1Q Median 3Q Max
-90.769 -8.725 1.275 11.275 91.541

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 88.7248 0.6511 136.261 <2e-16 ***
alcohol 0.9469 0.1108 8.549 <2e-16 ***
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 21.83 on 1297 degrees of freedom


(26 observations deleted due to missingness)
Multiple R-squared: 0.05335, Adjusted R-squared: 0.05262
F-statistic: 73.09 on 1 and 1297 DF, p-value: < 2.2e-16

Mine Çetinkaya mine@stat.ucla.edu


Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

1 Preliminaries

2 Data sets

3 Descriptive Statistics

4 Probability Models

5 Hypothesis Testing and Confidence Intervals

6 Linear Regression

7 Online Resources for R

8 Upcoming Mini-Courses

9 Exercises
Mine Çetinkaya mine@stat.ucla.edu
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Online Resources for R

Download R: http:// cran.stat.ucla.edu/

Mine Çetinkaya mine@stat.ucla.edu


Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Online Resources for R

Download R: http:// cran.stat.ucla.edu/


Search Engine for R: rseek.org

Mine Çetinkaya mine@stat.ucla.edu


Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Online Resources for R

Download R: http:// cran.stat.ucla.edu/


Search Engine for R: rseek.org
R Reference Card:
http:// cran.r-project.org/ doc/ contrib/ Short-refcard.pdf

Mine Çetinkaya mine@stat.ucla.edu


Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Online Resources for R

Download R: http:// cran.stat.ucla.edu/


Search Engine for R: rseek.org
R Reference Card:
http:// cran.r-project.org/ doc/ contrib/ Short-refcard.pdf
UCLA Statistics Information Portal: http:// info.stat.ucla.edu/ grad/

Mine Çetinkaya mine@stat.ucla.edu


Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Online Resources for R

Download R: http:// cran.stat.ucla.edu/


Search Engine for R: rseek.org
R Reference Card:
http:// cran.r-project.org/ doc/ contrib/ Short-refcard.pdf
UCLA Statistics Information Portal: http:// info.stat.ucla.edu/ grad/
UCLA Statistical Consulting Center: http:// scc.stat.ucla.edu

Mine Çetinkaya mine@stat.ucla.edu


Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

1 Preliminaries

2 Data sets

3 Descriptive Statistics

4 Probability Models

5 Hypothesis Testing and Confidence Intervals

6 Linear Regression

7 Online Resources for R

8 Upcoming Mini-Courses

9 Exercises
Mine Çetinkaya mine@stat.ucla.edu
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Upcoming Mini-Courses

April 7, Tuesday: Introductory to Statistics with R


April 9, Thursday: Basic R
April 14, Tuesday: Basic R
April 16, Thursday: Migrating to R for SAS/SPSS/Stata
Users
For a schedule of all mini-courses offered please visit
http:// scc.stat.ucla.edu/ mini-courses .

Mine Çetinkaya mine@stat.ucla.edu


Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Thank you
Any questions?

Mine Çetinkaya mine@stat.ucla.edu


Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

1 Preliminaries

2 Data sets

3 Descriptive Statistics

4 Probability Models

5 Hypothesis Testing and Confidence Intervals

6 Linear Regression

7 Online Resources for R

8 Upcoming Mini-Courses

9 Exercises
Mine Çetinkaya mine@stat.ucla.edu
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Exercises

1 Construct side-by-side box plots for the distribution of amount


of time it takes students to get to class (time) by their means
of transportation (walk).
2 Usually younger students live on campus and older students
live off campus. Is there evidence to suggest this trend in this
data set? (Use a random sample of 100 students and
α = 0.05.)
3 Calculate a 90% confidence interval for the difference between
the average ages of students who live on campus and off
campus.

Mine Çetinkaya mine@stat.ucla.edu


Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Solution to Exercise 1
1 plot ( time ~ walk , main = " Time to get to class
\ n by type of transportation " )

Time to get to class


by type of transportation
150
100
Minutes

50
0

bicycle bus car (by yourself) carpool motorcycle other segway skateboard walk

Mine Çetinkaya mine@stat.ucla.edu


Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Solution to Exercise 2

1 t . test ( survey . sub $ ageinmonths [ survey .


sub $ oncampus == " yes " ] , survey .
sub $ ageinmonths [ survey . sub $ oncampus == " no
" ] , alternative = " less " , conf . level =
0.95)

Welch Two Sample t-test

data: survey.sub$ageinmonths[survey.sub$oncampus == "yes"] and


survey.sub$ageinmonths[survey.sub$oncampus == "no"]
t = -5.3322, df = 34.867, p-value = 2.964e-06
alternative hypothesis: true difference in means is less than 0
95 percent confidence interval:
-Inf -10.85376
sample estimates:
mean of x mean of y
232.6111 248.5000

Mine Çetinkaya mine@stat.ucla.edu


Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Solution to Exercise 3

1 t . test ( survey . sub $ ageinmonths [ survey .


sub $ oncampus == " yes " ] , survey .
sub $ ageinmonths [ survey . sub $ oncampus == " no
" ] , alternative = " two . sided " , conf . level
= 0.90)

Welch Two Sample t-test

data: survey.sub$ageinmonths[survey.sub$oncampus == "yes"] and


survey.sub$ageinmonths[survey.sub$oncampus == "no"]
t = -5.3322, df = 34.867, p-value = 5.929e-06
alternative hypothesis: true difference in means is not equal to 0
90 percent confidence interval:
-20.92402 -10.85376
sample estimates:
mean of x mean of y
232.6111 248.5000

Mine Çetinkaya mine@stat.ucla.edu


Introductory Statistics with R UCLA SCC