Вы находитесь на странице: 1из 78

Introduction to Course

CE 311S
January 16, 2014
COURSE OVERVIEW
Introductions
1
What is your name?
2
Where are you from?
3
Something interesting about you...
Introduction to Course Course Overview
Introductions
1
What is your name?
2
Where are you from?
3
Something interesting about you...
Introduction to Course Course Overview
Course Sta and Oce Hours
Instructor: Steve Boyles, ECJ 6.204, T/Th 3:15-4:15
TAs: Tarun Rambha, Jack Bringardner
Tutor: Kelley Grabner
Introduction to Course Course Overview
Probability is the study of how random processes

behave.

Well take this to mean anything which is currently unknown, but


important for some reason.
Introduction to Course Course Overview Course Content
Real-world examples abound. Some recent headlines:
Introduction to Course Course Overview Course Content
Job creation is still steady despite worries
Introduction to Course Course Overview Course Content
Cold weather concerns
Introduction to Course Course Overview Course Content
Astronauts Get Taller in Space
Introduction to Course Course Overview Course Content
Some civil engineering examples...
Structures: What wind load should we design a building to withstand?
Transportation: Where will people live and work in 30 years?
Water Resources: How can we manage reservoirs knowing there will be
both droughts and ood years?
Construction Management: How long will a project actually take, and at
what cost?
Geotechnical: What kind of soil actually exists where the foundation will
be?
Introduction to Course Course Overview Course Content
Statistics describes how we can use experiments to learn about random
processes.
By combining probability and statistics, we have the tools to make
decisions even without perfect information.
Introduction to Course Course Overview Course Content
When I graduate, what will my job prospects be?
Introduction to Course Course Overview Course Content
What can past historical data tell us about how much energy is needed for
heating? How severe the u season will be?
Introduction to Course Course Overview Course Content
How can we design systems to support humans in new environments?
Introduction to Course Course Overview Course Content
Prerequisites
Probability and statistics involve a good deal of calculus. However, the
math is usually not what students nd dicult.
This may be one of the rst classes you see where the hard part is guring
out what concepts and formulas apply to which problems.
Introduction to Course Course Overview Course Content
Course Materials
The textbook is:
Devore, J. L. Probability and Statistics for Engineering and Science
You do not need the most recent edition.
I will also post lecture slides, but these cannot replace your own notes.
Introduction to Course Course Overview Course Content
Grading
Category Weight
Homeworks 30%
Exam 1 (February 27) 20%
Exam 2 (April 1) 20%
Final Exam (May 10) 30%
+/ grading will be used. If you need an extension, you must ask at least
48 hours in adavnce.
Introduction to Course Course Overview Course Content
Miscellanea
Consult catalog and departmental advisors for add/drop policy.
Please coordinate with me and Services for Students with Disabilities
if you have a disability requiring alternate accomodations.
Academic dishonesty... dont do it.
Introduction to Course Course Overview Course Content
DEFINITIONS
Population: All of the objects we are interested in (people, days of the
week, etc.)
Sample: The portion of the population we observe (e.g., survey and
data sets)
Variable: A characteristic which varies over the population (voting
preference, commute time, etc.)
Introduction to Course Denitions
How hard will it be to nd a job?
Population: Everyone who wants to work in the United States
Sample: 60,000 people contacted by a BLS survey
Variable: Whether a person wants a job, but doesnt have one.
Introduction to Course Examples
How hard will it be to nd a job?
Population: Everyone who wants to work in the United States
Sample: 60,000 people contacted by a BLS survey
Variable: Whether a person wants a job, but doesnt have one.
Introduction to Course Examples
How hard will it be to nd a job?
Population: Everyone who wants to work in the United States
Sample: 60,000 people contacted by a BLS survey
Variable: Whether a person wants a job, but doesnt have one.
Introduction to Course Examples
How hard will it be to nd a job?
Population: Everyone who wants to work in the United States
Sample: 60,000 people contacted by a BLS survey
Variable: Whether a person wants a job, but doesnt have one.
Introduction to Course Examples
How much will Texans spend on heating this winter?
Population: Each day of the winter
Sample: Weather records (for the past 100 years or so)
Variable: The temperature each day
Introduction to Course Examples
How much will Texans spend on heating this winter?
Population: Each day of the winter
Sample: Weather records (for the past 100 years or so)
Variable: The temperature each day
Introduction to Course Examples
How much will Texans spend on heating this winter?
Population: Each day of the winter
Sample: Weather records (for the past 100 years or so)
Variable: The temperature each day
Introduction to Course Examples
How much will Texans spend on heating this winter?
Population: Each day of the winter
Sample: Weather records (for the past 100 years or so)
Variable: The temperature each day
Introduction to Course Examples
How much taller do astronauts get in space?
Population: Each human being in the world
Sample: A few astronauts
Variable: A persons change in height when in space
Introduction to Course Examples
How much taller do astronauts get in space?
Population: Each human being in the world
Sample: A few astronauts
Variable: A persons change in height when in space
Introduction to Course Examples
How much taller do astronauts get in space?
Population: Each human being in the world
Sample: A few astronauts
Variable: A persons change in height when in space
Introduction to Course Examples
How much taller do astronauts get in space?
Population: Each human being in the world
Sample: A few astronauts
Variable: A persons change in height when in space
Introduction to Course Examples
A Birthday Example
What are the odds that two people in this room share the same birthday?
Introduction to Course Examples
MEASURES OF LOCATION
AND VARIABILITY
As a starting point, we need a way to briey summarize an entire sample
with simple numerical values.
This is the realm of descriptive statistics.
Introduction to Course Measures of Location and Variability
For now, we consider two main types of descriptive
statistic.
Measures of location describe what a typical value of the variable in a
sample.
Measures of variability describe how close the variables in the sample are
to a typical value.
We will dene three measures of location: the mean, median, and mode.
Introduction to Course Measures of Location and Variability
For now, we consider two main types of descriptive
statistic.
Measures of location describe what a typical value of the variable in a
sample.
Measures of variability describe how close the variables in the sample are
to a typical value.
We will dene three measures of location: the mean, median, and mode.
Introduction to Course Measures of Location and Variability
If we have a sample consisting of n members of the population, we let x
i
denote the value of the variable for the i -th member of the sample
(1 i n)
The (sample) mean is dened as
x =
x
1
+ x
2
+ . . . + x
n
n
=

n
i =1
x
i
n
Introduction to Course Measures of Location and Variability
Example
The high temperatures for the last week are
76 50 58 67 65 74 74
What is the mean temperature in this sample?
x =
x
1
+ x
2
+ . . . + x
n
n
=

n
i =1
x
i
n
Introduction to Course Measures of Location and Variability
Solution
x =
76 + 50 + 58 + 67 + 65 + 74 + 74
7
= 66.3
The mean temperature is 66.3 degrees.
Introduction to Course Measures of Location and Variability
A few notes
To be technically correct, this is a sample mean because it depends
on the sample we have. However, it is often called just the mean
when it is obvious it refers to a sample.
The sample mean is also known as the arithmetic average.
It is often called just the average, but there are other types of
averages too.
Introduction to Course Measures of Location and Variability
A few notes
To be technically correct, this is a sample mean because it depends
on the sample we have. However, it is often called just the mean
when it is obvious it refers to a sample.
The sample mean is also known as the arithmetic average.
It is often called just the average, but there are other types of
averages too.
Introduction to Course Measures of Location and Variability
A few notes
To be technically correct, this is a sample mean because it depends
on the sample we have. However, it is often called just the mean
when it is obvious it refers to a sample.
The sample mean is also known as the arithmetic average.
It is often called just the average, but there are other types of
averages too.
Introduction to Course Measures of Location and Variability
The (sample) median is dened as the middle value in the data set.
Using the temperature data we had before,
76 50 58 67 65 74 74
we can rewrite these in increasing order, and identify the middle value:
50 58 65 67 74 74 76
so for this sample, the median is 67 degrees.
Introduction to Course Measures of Location and Variability
If the sample size is odd, the middle value is well-dened. If the sample
size is even, there are two middle values.
In this case, the median is dened as the mean of the two middle values.
50 53 58 65 67 74 74 76
For this sample, the median is (65 + 67)/2 = 66 degrees.
Introduction to Course Measures of Location and Variability
The (sample) mode is dened as the most frequently occurring value in
the data set.
50 58 65 67 74 74 76
74 occurs most often (twice), so it is the sample mode.
Unlike the mean and median, there can be more than one mode! If there is a
tie for the most frequent observation, all of these values qualify as modes.
Introduction to Course Measures of Location and Variability
The (sample) mode is dened as the most frequently occurring value in
the data set.
50 58 65 67 74 74 76
74 occurs most often (twice), so it is the sample mode.
Unlike the mean and median, there can be more than one mode! If there is a
tie for the most frequent observation, all of these values qualify as modes.
Introduction to Course Measures of Location and Variability
So, which measure of location is best?
IT DEPENDS!
Introduction to Course Measures of Location and Variability
So, which measure of location is best?
IT DEPENDS!
Introduction to Course Measures of Location and Variability
Dierent measures of location tell dierent stories.
Lets say the proud country of Boylesland has three citizens, with net
worths of $20,000, $30,000, and $550,000, respectively.
What is the mean net worth and the median net worth?
Introduction to Course Measures of Location and Variability
Dierent measures of location tell dierent stories.
Next year, their net worths are $15,000, $25,000, and $860,000,
respectively.
What is the mean net worth and the median net worth?
Introduction to Course Measures of Location and Variability
Introduction to Course Measures of Location and Variability
One way to think about the mean is as the centroid or balance point of
the data set.
200k
20k 30k 550k
300k
15k 25k 860k
Introduction to Course Measures of Location and Variability
One lesson from the story is that the sample mean is highly aected by
outliers. Even just a few extremely high or extremely low values in a
sample can have a dramatic impact on the sample mean. On the other
hand, the median is robust to outliers.
So does this mean the median is better?
Introduction to Course Measures of Location and Variability
Lets go back to the rst situation, with net worths of $20,000, $30,000,
and $500,000, respectively.
Now, next year the net worths are $30,000, $30,000, and $540,000.
What are the new mean net worth and median net worth?
Introduction to Course Measures of Location and Variability
The moral of the story:
You always lose information when you reduce a data set to a single number.
A single statistic is only one facet of the problem. You can often get a better
view of the true situation by looking at multiple statistics.
Introduction to Course Measures of Location and Variability
MEASURES OF
VARIABILITY
None of the measures discussed so far address the variation within the
data set. All of the following samples have exactly the same mean,
median, and mode:
100 100 100 100 100 100
98 99 100 100 101 102
90 95 100 100 105 110
0 50 100 100 150 200
0 1 100 100 199 200
0 0 100 100 100 300
To distinguish between these, we develop measures of variability.
Introduction to Course Measures of Variability
None of the measures discussed so far address the variation within the
data set. All of the following samples have exactly the same mean,
median, and mode:
100 100 100 100 100 100
98 99 100 100 101 102
90 95 100 100 105 110
0 50 100 100 150 200
0 1 100 100 199 200
0 0 100 100 100 300
To distinguish between these, we develop measures of variability.
Introduction to Course Measures of Variability
There are a few ways to think about measures of variability.
They reect the consistency of the sample from one observation to
the next.
They describe the spread in the data.
If measures of location describe what a typical sample element looks
like, measures of variability show how typical a typical element is.
Introduction to Course Measures of Variability
The sample variance is dened as
s
2
=

(x
i
x)
2
n 1
The sample variance does not have the same units as the original sample;
because it is useful to have a measure of variability with the same units,
we dene the sample standard deviation as
s =

s
2
Introduction to Course Measures of Variability
The sample variance is dened as
s
2
=

(x
i
x)
2
n 1
The sample variance does not have the same units as the original sample;
because it is useful to have a measure of variability with the same units,
we dene the sample standard deviation as
s =

s
2
Introduction to Course Measures of Variability
Example
What are the sample variance and sample standard deviation of the
following temperature data?
76 50 58 67 65 74 74
Notice that the formula for sample variance (s
2
=

(x
i
x)
2
n1
) includes the
sample mean. We already calculated this to be 66.3. So
s
2
=
(76 66.3)
2
+ (50 66.3)
2
+ . . . + (74 66.3)
2
+ (74 66.3)
2
6
= 91
and s =

s
2
= 9.6 degrees.
Introduction to Course Measures of Variability
Example
What are the sample variance and sample standard deviation of the
following temperature data?
76 50 58 67 65 74 74
Notice that the formula for sample variance (s
2
=

(x
i
x)
2
n1
) includes the
sample mean. We already calculated this to be 66.3. So
s
2
=
(76 66.3)
2
+ (50 66.3)
2
+ . . . + (74 66.3)
2
+ (74 66.3)
2
6
= 91
and s =

s
2
= 9.6 degrees.
Introduction to Course Measures of Variability
Example
The data sets given at the beginning of these section have dramatically
dierent sample variance and sample standard deviation:
Data s
2
s
100 100 100 100 100 100 0 0
98 99 100 100 101 102 2 1.4
90 95 100 100 105 110 50 7.1
0 50 100 100 150 200 5000 71
0 1 100 100 199 200 ? ?
0 0 100 100 100 300 ? ?
Introduction to Course Measures of Variability
Example
The data sets given at the beginning of these section have dramatically
dierent sample variance and sample standard deviation:
Data s
2
s
100 100 100 100 100 100 0 0
98 99 100 100 101 102 2 1.4
90 95 100 100 105 110 50 7.1
0 50 100 100 150 200 5000 71
0 1 100 100 199 200 ? ?
0 0 100 100 100 300 ? ?
Introduction to Course Measures of Variability
Example
The data sets given at the beginning of these section have dramatically
dierent sample variance and sample standard deviation:
Data s
2
s
100 100 100 100 100 100 0 0
98 99 100 100 101 102 2 1.4
90 95 100 100 105 110 50 7.1
0 50 100 100 150 200 5000 71
0 1 100 100 199 200 7920 89
0 0 100 100 100 300 12000 110
Introduction to Course Measures of Variability
Example
The data sets given at the beginning of these section have dramatically
dierent sample variance and sample standard deviation:
Data s
2
s
100 100 100 100 100 100 0 0
98 99 100 100 101 102 2 1.4
90 95 100 100 105 110 50 7.1
0 50 100 100 150 200 5000 71
0 1 100 100 199 200 7920 89
0 0 100 100 100 300 12000 110
Introduction to Course Measures of Variability
Where does the sample variance formula come from?
s
2
=

(x
i
x)
2
n 1
This kind of looks like an average... what if it were s
2
=

(x
i
x)
2
n
instead?
In this case, s
2
would be the average value of (x
i
x)
2
. What does this
mean?
Introduction to Course Measures of Variability
Where does the sample variance formula come from?
s
2
=

(x
i
x)
2
n 1
This kind of looks like an average... what if it were s
2
=

(x
i
x)
2
n
instead?
In this case, s
2
would be the average value of (x
i
x)
2
. What does this
mean?
Introduction to Course Measures of Variability
Where does the sample variance formula come from?
s
2
=

(x
i
x)
2
n 1
This kind of looks like an average... what if it were s
2
=

(x
i
x)
2
n
instead?
In this case, s
2
would be the average value of (x
i
x)
2
. What does this
mean?
Introduction to Course Measures of Variability
Where does the sample variance formula come from?
(x
i
x)
2
is a measure of how far away each observation is from the sample
mean. But why square it?
Whats wrong with taking the average value of x
i
x?
What about taking the average value of |x
i
x|?
Introduction to Course Measures of Variability
Where does the sample variance formula come from?
(x
i
x)
2
is a measure of how far away each observation is from the sample
mean. But why square it?
Whats wrong with taking the average value of x
i
x?
What about taking the average value of |x
i
x|?
Introduction to Course Measures of Variability
Where does the sample variance formula come from?
s
2
=

(x
i
x)
2
n 1
So why do we divide by n 1 instead of n?
(Hint: s
2
=
n
n1

(x
i
x)
2
n
)
The sample mean is probably not the population mean. (In the temperature
example above, maybe the population mean is 66 degrees, while the mean
of our sample was 66.3 degrees). Thus, dividing by n would bias the sample
variance low because our estimate of the mean is inaccurate. Well talk
more about this later in the course.
Introduction to Course Measures of Variability
Where does the sample variance formula come from?
s
2
=

(x
i
x)
2
n 1
So why do we divide by n 1 instead of n?
(Hint: s
2
=
n
n1

(x
i
x)
2
n
)
The sample mean is probably not the population mean. (In the temperature
example above, maybe the population mean is 66 degrees, while the mean
of our sample was 66.3 degrees). Thus, dividing by n would bias the sample
variance low because our estimate of the mean is inaccurate. Well talk
more about this later in the course.
Introduction to Course Measures of Variability
Where does the sample variance formula come from?
s
2
=

(x
i
x)
2
n 1
So why do we divide by n 1 instead of n?
(Hint: s
2
=
n
n1

(x
i
x)
2
n
)
The sample mean is probably not the population mean. (In the temperature
example above, maybe the population mean is 66 degrees, while the mean
of our sample was 66.3 degrees). Thus, dividing by n would bias the sample
variance low because our estimate of the mean is inaccurate. Well talk
more about this later in the course.
Introduction to Course Measures of Variability
Where does the sample variance formula come from?
s
2
=

(x
i
x)
2
n 1
So why do we divide by n 1 instead of n?
(Hint: s
2
=
n
n1

(x
i
x)
2
n
)
The sample mean is probably not the population mean. (In the temperature
example above, maybe the population mean is 66 degrees, while the mean
of our sample was 66.3 degrees). Thus, dividing by n would bias the sample
variance low because our estimate of the mean is inaccurate. Well talk
more about this later in the course.
Introduction to Course Measures of Variability

Вам также может понравиться