Вы находитесь на странице: 1из 57

# Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg.

## UCLA Department of Statistics

Statistical Consulting Center

## Introductory Statistics with R

Mine Çetinkaya
mine@stat.ucla.edu

April 2, 2009

## Mine Çetinkaya mine@stat.ucla.edu

Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Outline
1 Preliminaries

2 Data sets

3 Descriptive Statistics

4 Probability Models

## 5 Hypothesis Testing and Confidence Intervals

6 Linear Regression

## 7 Online Resources for R

8 Upcoming Mini-Courses

9 Exercises
Mine Çetinkaya mine@stat.ucla.edu
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

1 Preliminaries
Software Installation
R Help

2 Data sets

3 Descriptive Statistics

4 Probability Models

## 5 Hypothesis Testing and Confidence Intervals

6 Linear Regression

## 7 Online Resources for R

8 Upcoming Mini-Courses

9 Exercises
Mine Çetinkaya mine@stat.ucla.edu
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Software Installation

Installing R on a Mac

1 Go to
http://cran.r-project.org/
and select MacOS X
2 Select to download the
latest version: 2.8.1
(2008-12-22)
3 Install and Open. The R
window should look like this:

## Mine Çetinkaya mine@stat.ucla.edu

Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

R Help

R Help

## For help with any function in R,

put a question mark before the
function name to determine what
arguments to use, examples and
background information.
1 ? plot

## Mine Çetinkaya mine@stat.ucla.edu

Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

1 Preliminaries

2 Data sets
Loading data into R
Viewing data sets in R

3 Descriptive Statistics

4 Probability Models

## 5 Hypothesis Testing and Confidence Intervals

6 Linear Regression

## 7 Online Resources for R

8 Upcoming Mini-Courses

9 Exercises
Mine Çetinkaya mine@stat.ucla.edu
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

## Loading a data set into R:

1 survey = read . table ( " http : / / www . stat . ucla .
edu / ~ mine / students _ survey _ 2008. txt " ,
header = TRUE , sep = " \ t " )

1 dim ( survey )

[1] 1325 29

## Mine Çetinkaya mine@stat.ucla.edu

Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

## Viewing data sets in R

Displaying the first 3 rows and 5 columns of the data set:
1 survey [1:3 ,1:5]
gender hand eyecolor glasses california
1 female left hazel yes yes
2 male right brown no no
3 female right brown yes yes

## Displaying the variable names in the data set:

1 names ( survey )

## [1] "gender" "hand" "eyecolor" "glasses" "california"

[6] "birthmonth" "birthday" "birthyear" "ageinmonths" "height"
[11] "graduate" "oncampus" "time" "walk" "hsclass"
...

## Attaching the variables in a data set:

1 attach ( survey )

## Mine Çetinkaya mine@stat.ucla.edu

Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

1 Preliminaries

2 Data sets

3 Descriptive Statistics
Variable classes
Displaying categorical data
Displaying quantitative data
Describing distributions numerically

4 Probability Models

## 5 Hypothesis Testing and Confidence Intervals

6 Linear Regression

## 7 Online Resources for R

8 Upcoming Mini-Courses

9 Exercises
Mine Çetinkaya mine@stat.ucla.edu
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Variable classes

## Displaying the class of a variable:

1 class ( gender )

[1] "integer"

## Changing the class of a variable:

1 gender = as . factor ( gender )
2 class ( gender )

[1] "factor"

## Mine Çetinkaya mine@stat.ucla.edu

Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Tables

## Tables are useful for displaying the distribution of categorical

variables.
1 table ( gender )

gender
female male
882 443

## Mine Çetinkaya mine@stat.ucla.edu

Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

## Displaying categorical data

Contingency tables

## Contingency tables display two categorical variables at a time.

1 table ( gender , hand )

hand
gender ambidextrous left right
female 9 67 806
male 11 45 387

## Mine Çetinkaya mine@stat.ucla.edu

Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

## Frequency bar plots

Display counts of each category next to each other for easy
comparison.
1 barplot ( table ( gender ) , main = " Barplot of
Gender " )

Barplot of Gender
600
0 200

female male

## Mine Çetinkaya mine@stat.ucla.edu

Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

## Relative frequency bar plots

Display relative proportions of each category.
1 barplot ( table ( gender ) / length ( gender ) , main = "
Relative Frequency \ n Barplot of Gender " )

Relative Frequency
Barplot of Gender
0.6
0.3
0.0

female male

## Mine Çetinkaya mine@stat.ucla.edu

Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

## Segmented bar charts

Displays two categorical variables at a time.
1 barplot ( table ( gender , hand ) , col = c ( " skyblue "
, " blue " ) , main = " Segmented Bar Plot \ n
of Gender " )
2 legend ( " topleft " , c ( " females " ," males " ) , col =
c ( " skyblue " , " blue " ) , pch = 16 , inset =
0.05)

of Gender

females
males
200 400 600 800
0

## Mine Çetinkaya mine@stat.ucla.edu

Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

## Displaying categorical data

Pie charts
Pie charts display counts as percentages of individuals in each
category.
1 pct = round ( table ( gender ) / length ( gender ) *
100)
2 lbls = paste ( names ( table ( gender ) ) , " \ n " , " % " ,
pct )
3 pie ( table ( gender ) , labels = lbls )

female
% 67

male
% 33

## Mine Çetinkaya mine@stat.ucla.edu

Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

## Displaying quantitative data

Histograms
Display the number of cases in each bin
1 hist ( ageinmonths , main = " Histogram of Age in
Months " )

400
300
Frequency

200
100
0

ageinmonths

## Mine Çetinkaya mine@stat.ucla.edu

Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

## Relative frequency histograms

Display the proportion of of cases in each bin.
1 hist ( ageinmonths , freq = FALSE , main = "
Relative Frequency \ n Histogram of Age in
Months " , xlab = " Age in Months " )

Relative Frequency
Histogram of Age in Months
0.030
0.020
Density

0.010
0.000

Age in Months

## Mine Çetinkaya mine@stat.ucla.edu

Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

## Displaying quantitative data

Stem-and-Leaf Plots
Preserve individual data values.
1 stem ( ageinmonths )

## The decimal point is 1 digit(s) to the right of the |

20 | 48
21 | 004444555566666666666666666777777777778888888888889999999999999999
22 | 00000000000000000000000000111111111111111122222222222222222222333333+258
23 | 00000000000000000000000000000000000000000000001111111111111111111111+379
24 | 00000000000000000000000000000000000000000000111111111111111111111111+170
25 | 00000000000001111111111111112222222222222222222223333333344444444445+24
26 | 000000000001111111111222222333334444444444556666778889
27 | 00111222222344566789
28 | 01334558888
29 | 0004569
30 | 267
31 | 02257
32 | 44
33 | 5
34 | 89
35 | 3

## Mine Çetinkaya mine@stat.ucla.edu

Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

## Displaying quantitative data

Boxplots
1 boxplot ( ageinmonths , main = " Boxplot of Age in
Months " )

350
300
250
200

## Five Number Summary (Min, Q1, Median, Q3, Max):

1 fivenum ( ageinmonths )
[1] 204 228 235 243 353

## Mine Çetinkaya mine@stat.ucla.edu

Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

## Describing distributions numerically

Summary

Categorical variables:
1 summary ( hand )

## ambidextrous left right

20 112 1193

Quantitative variables:
1 summary ( ageinmonths )

## Min. 1st Qu. Median Mean 3rd Qu. Max.

204.0 228.0 235.0 237.8 243.0 353.0

## Mine Çetinkaya mine@stat.ucla.edu

Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

## Describing distributions numerically

Measures of center
Mean (arithmetic average):
1 mean ( ageinmonths )

[1] 237.8309
Median (value that divides the histogram into two equal
areas):
1 median ( ageinmonths )

[1] 235
Mode (the most frequent value):
1 as . numeric ( names ( sort ( - table ( ageinmonths ) )
) [1])

[1] 228
Mine Çetinkaya mine@stat.ucla.edu
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

## Adding measures to plots

Adding mean and median to a histogram.
1 hist ( ageinmonths , main = " Histogram of Age in
Months " )
2 abline ( v = mean ( ageinmonths ) , col = " blue " )
3 abline ( v = median ( ageinmonths ) , col = " green " )
4 legend ( " topright " , c ( " Mean " , " Median " ) , pch =
16 , col = c ( " blue " , " green " ) )

Mean
400

Median
300
Frequency

200
100
0

## 200 250 300 350

ageinmonths
Mine Çetinkaya mine@stat.ucla.edu
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

## Describing distributions numerically

Measures of spread
Range (Min, Max):
1 range ( ageinmonths )

## [1] 204 353

IQR:
1 IQR ( ageinmonths )

[1] 15

Standard deviation:
1 sd ( ageinmonths )

[1] 16.03965

## Mine Çetinkaya mine@stat.ucla.edu

Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

1 Preliminaries

2 Data sets

3 Descriptive Statistics

4 Probability Models
Geometric
Binomial
Poisson
Normal

## 5 Hypothesis Testing and Confidence Intervals

6 Linear Regression

## 7 Online Resources for R

8 Upcoming Mini-Courses

9 Exercises
Mine Çetinkaya mine@stat.ucla.edu
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Geometric

Geometric distribution

## If the probability of success is 0.35, what is the probability that the

first success will be on the 5th trial?
1 dgeom (4 ,0.35)

[1] 0.06247719

Note: dgeom gives the density (or probability mass function for discrete
variables), pgeom gives the distribution function, qgeom gives the
quantile function, and rgeom generates random deviates. This is true for
the functions used for Binomial, Poisson and Normal calculations as well.

## Mine Çetinkaya mine@stat.ucla.edu

Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Binomial

Binomial distribution

## If the probability of success is 0.35, what is the probability of

3 successes in 5 trials?
1 dbinom (3 ,5 ,0.35)

[1] 0.1811469

## at least 3 successes in 5 trials?

1 sum ( dbinom (3:5 ,5 ,0.35) )

[1] 0.2351694

## Mine Çetinkaya mine@stat.ucla.edu

Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Poisson

Poisson distribution
The number of traffic accidents per week in a small city has
Poisson distribution with mean equal to 3. What is the probability
of
two accidents in a week?
1 dpois (2 ,3)

[1] 0.2240418

## at most one accident in a week?

1 sum ( dpois (0:1 ,3) )

[1] 0.1991483

## Mine Çetinkaya mine@stat.ucla.edu

Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Normal

Normal distribution
Scores on an exam are distributed normally with a mean of 65 and
a standard deviation of 12. What percentage of the students have
scores
below 50?
1 pnorm (50 ,65 ,12)

[1] 0.1056498

## between 50 and 70?

1 pnorm (70 ,65 ,12) - pnorm (50 ,65 ,12)

[1] 0.5558891

## Mine Çetinkaya mine@stat.ucla.edu

Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Normal

## What is the 90th percentile of the score distribution?

1 qnorm (.90 ,65 ,12)

[1] 80.37862

## Mine Çetinkaya mine@stat.ucla.edu

Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

1 Preliminaries

2 Data sets

3 Descriptive Statistics

4 Probability Models

## 5 Hypothesis Testing and Confidence Intervals

One sample means
Two sample means
One sample proportions
Two sample proportions

6 Linear Regression

## 7 Online Resources for R

8 Upcoming Mini-Courses

9 Exercises
Mine Çetinkaya mine@stat.ucla.edu
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

## Hypothesis testing for one sample means

Is there evidence to suggest that the average age in months for
Stats 10 students is more than 235 months? Use α = 0.05.
1 sample100 = sample (1:1325 , 100 , replace =
FALSE )
2 survey . sub = survey [ sample100 ,]
3 t . test ( survey . sub \$ ageinmonths , alternative = "
greater " , mu = 235 , conf . level = 0.95)

## One Sample t-test

data: survey.sub\$ageinmonths
t = 1.5922, df = 99, p-value = 0.05726
alternative hypothesis: true mean is greater than 235
95 percent confidence interval:
234.9118 Inf
sample estimates:
mean of x
237.06

## Mine Çetinkaya mine@stat.ucla.edu

Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

## The t.test function prints out a confidence interval as well.

However this function returns a one-sided interval when the
alternative is "greater" or "less".
When alternative = "greater" is chosen the lower
confidence bound is calculated and the upper bound is given
as Inf by default.
When alternative = "less" is chosen the upper
confidence bound is calculated and the lower bound is given
as -Inf by default.
When alternative = "two.sided" is chosen both the
upper and the lower confidence bounds are calculated.

## Mine Çetinkaya mine@stat.ucla.edu

Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

## 1 t . test ( survey . sub \$ ageinmonths , alternative = "

two . sided " , mu = 235 , conf . level = 0.90)

## One Sample t-test

data: survey.sub\$ageinmonths
t = 1.5922, df = 99, p-value = 0.1145
alternative hypothesis: true mean is not equal to 235
90 percent confidence interval:
234.9118 239.2082
sample estimates:
mean of x
237.06

## Note that we changed the confidence level to 0.90 in order to

correspond to a one-sided hypothesis test with α = 0.05.

## Mine Çetinkaya mine@stat.ucla.edu

Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

## Confidence intervals for one sample means (cont.)

Alternative calculation of confidence interval:
1 onesample . mean . ci = function (x , conf . level ) {
2 tstar = - qt ( p = ((1 - conf . level ) / 2) , df = (
length ( x ) - 1) )
3 xbar = mean ( x )
4 sexbar = sd ( x ) / sqrt ( length ( x ) )
5 cilower = xbar - tstar * sexbar
6 ciupper = xbar + tstar * sexbar
7 return ( list = c ( cilower , ciupper ) )
8 }
9 onesample . mean . ci ( survey . sub \$ ageinmonths ,
0.90)

## Mine Çetinkaya mine@stat.ucla.edu

Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

## Hypothesis testing and CI for two sample means

Is there a difference between the ages of females and males?
Construct a 95% confidence interval for the difference between the
average ages of females and males.
1 t . test ( survey . sub \$ ageinmonths [ survey .
sub \$ gender == " female " ] , survey .
sub \$ ageinmonths [ survey . sub \$ gender == " male
" ] , alternative = " two . sided " , conf . level
= 0.95)
Welch Two Sample t-test

## data: survey.sub\$ageinmonths[survey.sub\$gender == "female"] and

survey.sub\$ageinmonths[survey.sub\$gender == "male"]
t = 1.25, df = 95.736, p-value = 0.2143
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.765100 7.768572
sample estimates:
mean of x mean of y
238.1406 235.1389

## Mine Çetinkaya mine@stat.ucla.edu

Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

## 64 out of 100 students in a random sample are females. Is there

evidence to suggest that the population proportion of females is
less than 65%? Use a 90% confidence level.
1 prop . test (64 , 100 , p = 0.65 , alternative = "
less " , conf . level = 0.90)

## data: 64 out of 100, null probability 0.65

X-squared = 0.011, df = 1, p-value = 0.4583
alternative hypothesis: true p is less than 0.65
90 percent confidence interval:
0.0000000 0.7035286
sample estimates:
p
0.64

## Mine Çetinkaya mine@stat.ucla.edu

Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

## Confidence intervals for one sample proportions

Just like the t.test, the prop.test function will calculate both
the upper and the lower bounds of the confidence interval only
when alternative = "two.sided" is chosen. Otherwise a lower
bound of 0 or an upper bound of 1 is produced.
1 prop . test (64 , 100 , p = 0.65 , alternative = "
two . sided " , conf . level = 0.80)

## data: 64 out of 100, null probability 0.65

X-squared = 0.011, df = 1, p-value = 0.9165
alternative hypothesis: true p is not equal to 0.65
80 percent confidence interval:
0.5715825 0.7035286
sample estimates:
p
0.64

## Mine Çetinkaya mine@stat.ucla.edu

Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

## 54 out of 64 females and 32 out of 36 males are right handed. Is

there evidence to suggest that proportions of males and females
who are right handed are different?
1 prop . test ( c (54 ,32) , c (64 ,36) )

## data: c(54, 32) out of c(64, 36)

X-squared = 0.1051, df = 1, p-value = 0.7458
alternative hypothesis: two.sided
95 percent confidence interval:
-0.2026789 0.1124012
sample estimates:
prop 1 prop 2
0.8437500 0.8888889

## Mine Çetinkaya mine@stat.ucla.edu

Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

1 Preliminaries

2 Data sets

3 Descriptive Statistics

4 Probability Models

## 5 Hypothesis Testing and Confidence Intervals

6 Linear Regression
Scatterplots, Association, and Correlation
Simple Linear Regression

## 7 Online Resources for R

8 Upcoming Mini-Courses

9 Exercises
Mine Çetinkaya mine@stat.ucla.edu
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

## Scatterplots, Association, and Correlation

Scatterplots
Is there an association between amount of alcohol consumed and
maximum speed?
1 plot ( speed ~ alcohol , main = " Scatterplot of
Speed vs . Alcohol " , pch = 20 , cex = 0.5)

150
speed

100
50
0

0 20 40 60 80

alcohol

## Mine Çetinkaya mine@stat.ucla.edu

Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Correlation

obs " )

[1] 0.2309745

## Mine Çetinkaya mine@stat.ucla.edu

Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

## Simple Linear Regression

Build a linear regression model predicting speed from alcohol.
1 summary ( lm ( speed ~ alcohol ) )

Call:
lm(formula = speed ~ alcohol)

Residuals:
Min 1Q Median 3Q Max
-90.769 -8.725 1.275 11.275 91.541

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 88.7248 0.6511 136.261 <2e-16 ***
alcohol 0.9469 0.1108 8.549 <2e-16 ***
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

## Residual standard error: 21.83 on 1297 degrees of freedom

(26 observations deleted due to missingness)
Multiple R-squared: 0.05335, Adjusted R-squared: 0.05262
F-statistic: 73.09 on 1 and 1297 DF, p-value: < 2.2e-16

## Mine Çetinkaya mine@stat.ucla.edu

Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

1 Preliminaries

2 Data sets

3 Descriptive Statistics

4 Probability Models

## 5 Hypothesis Testing and Confidence Intervals

6 Linear Regression

## 7 Online Resources for R

8 Upcoming Mini-Courses

9 Exercises
Mine Çetinkaya mine@stat.ucla.edu
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

## Mine Çetinkaya mine@stat.ucla.edu

Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

## Download R: http:// cran.stat.ucla.edu/

Search Engine for R: rseek.org

## Mine Çetinkaya mine@stat.ucla.edu

Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

## Download R: http:// cran.stat.ucla.edu/

Search Engine for R: rseek.org
R Reference Card:
http:// cran.r-project.org/ doc/ contrib/ Short-refcard.pdf

## Mine Çetinkaya mine@stat.ucla.edu

Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

## Download R: http:// cran.stat.ucla.edu/

Search Engine for R: rseek.org
R Reference Card:
http:// cran.r-project.org/ doc/ contrib/ Short-refcard.pdf
UCLA Statistics Information Portal: http:// info.stat.ucla.edu/ grad/

## Mine Çetinkaya mine@stat.ucla.edu

Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

## Download R: http:// cran.stat.ucla.edu/

Search Engine for R: rseek.org
R Reference Card:
http:// cran.r-project.org/ doc/ contrib/ Short-refcard.pdf
UCLA Statistics Information Portal: http:// info.stat.ucla.edu/ grad/
UCLA Statistical Consulting Center: http:// scc.stat.ucla.edu

## Mine Çetinkaya mine@stat.ucla.edu

Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

1 Preliminaries

2 Data sets

3 Descriptive Statistics

4 Probability Models

## 5 Hypothesis Testing and Confidence Intervals

6 Linear Regression

## 7 Online Resources for R

8 Upcoming Mini-Courses

9 Exercises
Mine Çetinkaya mine@stat.ucla.edu
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Upcoming Mini-Courses

## April 7, Tuesday: Introductory to Statistics with R

April 9, Thursday: Basic R
April 14, Tuesday: Basic R
April 16, Thursday: Migrating to R for SAS/SPSS/Stata
Users
For a schedule of all mini-courses offered please visit
http:// scc.stat.ucla.edu/ mini-courses .

## Mine Çetinkaya mine@stat.ucla.edu

Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Thank you
Any questions?

## Mine Çetinkaya mine@stat.ucla.edu

Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

1 Preliminaries

2 Data sets

3 Descriptive Statistics

4 Probability Models

## 5 Hypothesis Testing and Confidence Intervals

6 Linear Regression

## 7 Online Resources for R

8 Upcoming Mini-Courses

9 Exercises
Mine Çetinkaya mine@stat.ucla.edu
Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Exercises

## 1 Construct side-by-side box plots for the distribution of amount

of time it takes students to get to class (time) by their means
of transportation (walk).
2 Usually younger students live on campus and older students
live off campus. Is there evidence to suggest this trend in this
data set? (Use a random sample of 100 students and
α = 0.05.)
3 Calculate a 90% confidence interval for the difference between
the average ages of students who live on campus and off
campus.

## Mine Çetinkaya mine@stat.ucla.edu

Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Solution to Exercise 1
1 plot ( time ~ walk , main = " Time to get to class
\ n by type of transportation " )

## Time to get to class

by type of transportation
150
100
Minutes

50
0

bicycle bus car (by yourself) carpool motorcycle other segway skateboard walk

## Mine Çetinkaya mine@stat.ucla.edu

Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Solution to Exercise 2

## 1 t . test ( survey . sub \$ ageinmonths [ survey .

sub \$ oncampus == " yes " ] , survey .
sub \$ ageinmonths [ survey . sub \$ oncampus == " no
" ] , alternative = " less " , conf . level =
0.95)

## data: survey.sub\$ageinmonths[survey.sub\$oncampus == "yes"] and

survey.sub\$ageinmonths[survey.sub\$oncampus == "no"]
t = -5.3322, df = 34.867, p-value = 2.964e-06
alternative hypothesis: true difference in means is less than 0
95 percent confidence interval:
-Inf -10.85376
sample estimates:
mean of x mean of y
232.6111 248.5000

## Mine Çetinkaya mine@stat.ucla.edu

Introductory Statistics with R UCLA SCC
Prelim. Data Descriptive Statistics Prob. Models Hyp. Test & CI Linear Reg. Resources Upcoming Exercises

Solution to Exercise 3

## 1 t . test ( survey . sub \$ ageinmonths [ survey .

sub \$ oncampus == " yes " ] , survey .
sub \$ ageinmonths [ survey . sub \$ oncampus == " no
" ] , alternative = " two . sided " , conf . level
= 0.90)

## data: survey.sub\$ageinmonths[survey.sub\$oncampus == "yes"] and

survey.sub\$ageinmonths[survey.sub\$oncampus == "no"]
t = -5.3322, df = 34.867, p-value = 5.929e-06
alternative hypothesis: true difference in means is not equal to 0
90 percent confidence interval:
-20.92402 -10.85376
sample estimates:
mean of x mean of y
232.6111 248.5000

## Mine Çetinkaya mine@stat.ucla.edu

Introductory Statistics with R UCLA SCC