STAT2303 UARK Week 02 - Sampling Methods and Probability Distribution

B.
I.
Section 1.4 Sampling Methods

A.
1.
Systematic Sampling: Select
some starting point then select every
Kth member of the group.
Definitions:
1.
Simple Random Sample: A
simple random sample of n subjects is
selected in such a way that every
possible sample of the same size n has
the same chance of being chosen.
2.
Convenience Sampling: Using
results that are readily available or
that are easy to get.
3.
Stratified Sampling: We first
subdivide the population into at least
two different subgroups (or strata),
and draw samples from each group.
2.
Random Sample: Every
individual member has an equal
change of being selected from the
population.

**Consider the selection of three students from a
class of six students. If you use a coin toss to
consider to select a row where heads = female and
tails = male, then randomness is used and each
student has the same chance of being selected, but
the result is not a simple random sample.
With the Coin toss some groups of students do
not have an equal chance of being selected, Look at
our sample space for selection of three students:
{bbb, bbg, bgb, gbb, bgg, gbg, ggb, ggg} So as we
recall from our previous lesson , the group bbb
has only a 1/8 probability of being selected, but
two boys and a girl has a 3/8 probability of being
selected.
Thus although our sample is a random sample
(each individual has the same chance of being
selected), it is not a simple random sample
because all groups do not have the same chance of
being selected.
Sampling Methods
4.
Cluster Sampling: Divide the
population into sections (or clusters)
then randomly select some of those
clusters and then sample all members
of that cluster.
II. Section 2.2 Frequency

Distributions
A.
Definitions
1.
Frequency Distribution: Shows
how data is partitioned among several
categories (classes) by listing the
classes along with the number
(frequency) of data values in each of
them.

Example: IQ Score of Low Lead Test Group

IQ Score
Frequency
50-69
2
70-89
33
90-109
35
110-129
7
130-149
1

2.
Cumulative Frequency: The
sum of the frequency class and all
previous frequency classes.

IQ Score
50-69
70-89
90-109
110-129
130-149
Frequency
2
33
35
7
1
Cuml. Freq
2
35
70
77
78
III.
Section 2.3 Histograms

A.
Normal Distribution
1.
When graphed as a histogram
or frequency distribution table, a
normal distribution has a bell
shape. 1. The frequencies increase to
a maximum and then decrease, and
2. The graph has symmetry with the
left have being close to a mirror
image of the right half.

3.
Relative Frequency: The
proportion or percent of the
observation that falls within each
class.

*This is obtained by dividing the frequency by the
last number in the cumulative frequency column,
which is the total number of samples.

IQ Score
Frequency
Cuml
Freq
Probabil
ity
50-69
70-89
90-109
110-129
130-149

2
33
35
7
1

2
35
70
77
78

2/78
33/78
35/78
7/78
1/78

Relative
Freq.
3%
42%
45%
9%
1%

4.
Class Width: The difference
between two lower class limits.
*Example: The class limit in the above sample is 70-
50=20, 90-70=20, 110-90=20 and 130-110=20. Thus
the class width is 20.
5.
Class Boundary: The number
used to separate the classes but
without the gaps.

*Example: Notice that 69.5, 89.5. and 109.75 dont
exist on our class width, but the class boundary
includes all numbers that are less than 70 ie.
69.9999999999999999999999999999999-

6.
Class Midpoint: The value of
the middle of the classes. It is
computed by adding the two
consecutive lower class boundaries
and dividing by 2.
IV.
Section 1.3 Types of Data

A.
Definitions
1.
Parameter: A numerical
measurement describing some
characteristic of a population.
2.
Statistic: A numerical
measurement describing some
characteristic of a sample.

Example: The population size of 241,472,385 is a
parameter, because it is based on the entire
population of the United States. The sample size
of 2320 surveyed adults in the United States is a
statistic, because it is based on a sample. The
value of 5% would also be a statistic, because it is
also based on the sample.

3.
Quantitative Data (numerical)
data: consists of numbers
representing counts or
measurements.
4.
Categorical Data: Consists of
names or labels that are not numbers
representing counts or measures.

**Example: The ages (in years) of survey
respondents is Quantitative Data. Party
Affiliations like Republican or Democratic is
categorical data. Numbers on the back of the
Arkansas Razorbacks football team is categorical
data since the numbers do not provide some
tangible measurement.

5.
Discrete data: Results when
the data are quantitative and the
number of values is finite and consists
of whole numbers.
6.
Continuous Data: Results
when you have an infinitely many
possible quantitative values.
B.
Pareto Charts: This is a bar graph
for categorical data, with the added
stipulation that the bars are arranged in
descending order according to frequency.
The vertical scale of a pareto chart
represents frequencies or relative
frequencies and the horizontal scale
represents different categories of
qualitative data.

** Example: The number of eggs that hens lay in
one week is discrete data, because eggs are laid in
whole numbers, i.e. cant lay half an egg.
Then number of gallons of milk a cow produces
in a year is continuous data, because a cow may
produce any amount between 0 and a maximum.

V. Section 2.4 Graphs that Enlighten

and Graphs that Deceive
A.
Bar Graphs: use bars of equal
width to show frequencies of categories
of categorical data. The vertical scale
represents the frequencies or relative
frequencies and the horizontal scale
identifies the different categories.

1.
Pie Chart: A graph that depicts
categorical data as slices of a circle in
which the size of each circle is
proportional to the frequency count
for the category.
5.
Probability Histogram: a graph
consisting of bars of equal width
drawn adjacent to each other. The
horizontal scale represents classes of
quantitative data and the vertical
scale represents the probability.

*** Note that pie charts are not as good at
displaying information as Pareto Charts because
they are visually less informative.

VI. Section 5.2 Probability

Distributions
A.
B.
Requirements for a Probability
Distribution
Definitions:
1.
There is a numerical random
variable x and its values are
associated with corresponding
probabilities.
1.
Random Variable: a variable
that is typically represented by (x) that
has a single numerical value,
determined by chance, for each
outcome of a procedure.
P(x) = 1 where x assumes

!
all possible values. That is the sum of
all the probabilities must be equal to
1. But small rounding errors are ok.
2.
2.
Discrete Random Variable: a
variable that has a collection of values
that is finite and countable (whole
numbers).
3.
Continuous Random Variable:
has infinitely many values, and the
collection of values is not countable
(not a whole number).
4.
Probability Distribution: A
description that gives the probability
for each value of the random variable.
It is often expressed in the format of
a table, formula, or graph.
3.
!0 P(x) 1 for every
individual value of the random
variable x. This is just like before, with
the probability being between 0 and 1
inclusive.
C.
Identifying Unusual Results with
the Range Rule of Thumb
1.
A crude but simple tool for
understanding and interpreting
standard deviation. The center and
spread of a distribution can be used to
calculate a maximum usual and a
minimum usual value. IF a value falls
outside the min/max usual value then
it is deemed unusual.

**This is a guesstimate only and not a rigid rule.

D.
Using Probabilities to determine
when results are unusual
1.
Rare Event Rule: if, under a
given assumption, the probability of
an observed event is very small, then
we conclude that the assumption is
probability not correct.
B.
Density Curves
1.
The graph of a continuous
probability distribution that satisfies
the following two requirements: 1.
The total area under the curve equals
1, and 2. Every point on the curve
must have a vertical height that is zero
or greater.
C.
Area and Probability

1.
Because the total area under
the density curve is equal to 1, there
is a correspondence between area
and probability.

**Example: given the uniform distribution
illustrated, find the probability that a randomly
selected voltage level is greater than 124.5 volts.

a)
x successes among n
trials is an unusually high
number of successes if the
probability of x or more
success has a probability of
0.05 or less. !P(x) 0.05
b)
x successes among n
trials is an unusually low
number of successes if the
probability of x or fewer
success has a probability of
0.05 or less. !P(x) 0.05
VII. Section 6.2 The Standard Normal

Distribution
A.
Uniform Distributions
1.
A continuous random variable
has a uniform distribution if its values
are spread evenly over the range of
possibilities. The graph of a uniform
distribution is rectangular in shape.

Since the area is a rectangle, and we know the
formula for finding the area of a rectangle is base x
height, then to find the probability we multiply the
distance between the values on the x axis, 125.0-
124.5=.5 and the value of the height P(x)=.05.

Thus the problem becomes:
(125.0124.5)0.5 = 0.50.5

!= 0.25

So the probability of randomly selecting a voltage
level greater than 124.5 volts is P(x)= 0.25 .

** Example: For New York City weekday late-afternoon subway travel from Times Square to the Mets
Stadium, you can take the #7 train that leaves Times Square every 5 minutes. Given the subway
departure schedule and the arrival of a passenger, the waiting time is x between 0 and 5 minutes, as
described by the uniform distribution depicted below. Note that waiting times can have any value
between 0-5 minutes so it is possible to have a waiting time of 2.33457 minutes and that all waiting times
are equally likely.

Given the uniform distribution, find the probability that a randomly selected passenger has a waiting
time of more than 2 minutes.

STAT2303 UARK Week 02 - Sampling Methods and Probability Distribution

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

STAT2303 UARK Week 02 - Sampling Methods and Probability Distribution

Загружено:

Авторское право:

Доступные форматы

B.

Section 1.4 Sampling Methods

II. Section 2.2 Frequency

Section 2.3 Histograms

Section 1.3 Types of Data

V. Section 2.4 Graphs that Enlighten

VI. Section 5.2 Probability

P(x) = 1 where x assumes

Area and Probability

VII. Section 6.2 The Standard Normal

Вам также может понравиться