Академический Документы
Профессиональный Документы
Культура Документы
1–3)
**Information in dashed boxes will be covered in class**
Section 1.1—Introduction
Definitions:
______________________________: how to collect, organize, summarize and analyze
information so that conclusions can be drawn with a
measure of confidence
o ___________________________________: organizing and summarizing collected data
o ___________________________________: methods that take sample results to extend it to (generalize
about) the population with a measure of reliability
______________________________: entire group of interest in a study
______________________________: a single person or object being studied
______________________________: subset of the population being studied
______________________________: numerical summary of a sample computed
from
______________________________: numerical summary of a population ____________
______________________________: characteristics of individuals in the population
o ______________________________________: responses based on attributes or characteristics
o _______________________________________: responses are numerical measures
______________________________: finite (countable) number of possible values,
can be listed without skipping any
______________________________: infinite number of possible values, can have
more and more decimal places
1. In a study of all 2223 passengers aboard the Titanic, it is found that 706 survived when
it sank.
2. A recent survey of a sample of MBAs reported that the average salary for an MBA is
more than $82,000. (Source: The Wall Street Journal)
3. Starting salaries for the 667 MBA graduates from the University of Chicago Graduate
School of Business increased 8.5% from the previous year.
4. In a recent poll of Salt Lake Community College students, 83% of students owned a
vehicle.
5. In a recent survey of 457 attendies of Jumangi, 402 would recommend the movie to a
friend.
Exercise: Are the following a discrete or a continuous data set?
1. In a survey of 1059 adults, it is found that 39% of them have guns in their homes.
2. The number of heads obtained after flipping a coin five times.
3. The distance Tiger Woods can drive a golf ball.
4. Points scored in a basketball game.
5. Volume of water lost each day from a leaky faucet.
6. Length of a song.
7. Number of words in a song.
1. Gender
2. Temperature
3. Nation of origin
4. Number of siblings
5. Number of day
6. Grams of carbohydrates in a donut
7. Phone number
8. Value of a house
9. Zip code
Number of
Exercise: The following information Weight
Model Body Style Seats
relates to the 2011 model year product
3 Series Coupe 3362 4
line of BMW automobiles. Identify the
5 Series Sedan 4056 5
individual being studies, variables and if
6 Series Convertible 4277 4
the data corresponding to the variables
7 Series Sedan 4564 5
are qualitative, quantitative, continuous
X3 Sport Utility 4012 5
or discrete.
Z4 Coupe 3505 2
1. Rats with cancer are divided into two groups. One group received 5 milligrams of a medication
that is thought to fight cancer, and the other received 10 milligrams. After 2 years, the spread
of cancer is measured.
2. Concervation agents netted 250 large-mouth bass in a lake and determined how many were
carrying parasites.
3. Seventh grade students are randomly divided into two groups. One group is taught math using
traditional techniques; the other is taught math using a reform method. After 1 year, each
group is given an achievement test to compare proficiency.
4. A survey was conducted asking 400 people, “Do you prefer Coke or Pepsi?”
c) Can we conclude drinking six or more cups of coffee reduces the change of nonmelanoma skin
cancer?
Exercise: Get Married, Gain Weight Are young couples who marry or cohabitate more
likely togian weight than those who stay single? Researchers followed 8000 men and women for 7
years. At the start of the study, none of the participants were married or living with a romantic
partner. The researchers found that women who married or cohabitated during the study period
gained 9 pounds more then sinlge women, and married or cohabitating men gained, on average, 6
pounds mopre then sinlge men.
b) What is the response variable in the study? What is the explanatory variable?
d) Can we conclude that getting married or cohabitating causes one to gain weight?
Section 1.3—Simple Random Sampling
Definitions:
______________________________: using chance (an objective device) to select individuals from a
population to be included in the sample
______________________________: every sample of a certain size is equally likely (and every
subject has an equal chance of being selected)
Notation:
__________ population size
__________ sample size
______________________________: list of all individuals in a population being studied.
______________________________: once selected, an individual cannot be chosen again.
______________________________: once selected, individuals are placed back in the population
and can be chosen again.
Examples of random devices for selecting a Simple Random Sample: draw names from a hat;
computer software program/website, random number generator; random number table.
Step 1: Randomly find a starting point
Step 2: Using as many columns as digits needed, move through the table, skipping
numbers outside the range of the frame or ones that have already been used,
until you reach your sample size.
Population: N=10; Sample, n=4; start at row 5/column 26 and move across:
Population: N=65; Sample, n=6; start at row 16/column 14 and move downward:
Population: N=304; Sample, n=8; start at row 18/column 27 and move downward:
Page 1
Exercise: Sophia has 4 tickets to a concert and 10 friends that she would like to invite. Which of
the following would produce a simiple random sample of the 4 friends that she will bring with
her:
Mike, Jamie, Adam, Yvette, Ashley, Monica, Cherie, Julie, Willard, Bruce
a. List each persons name on a separate pice of paper, place them in a hat and draw 4.
b. List the names in alphabetical order and take the first 4 names.
c. Ask one of her friends who she would bring.
d. Number the friends from 1 to 10 and us a random number generator to produce 4 numbers
between 1 and 10 that correspond to the 10 friends.
Use the TI graphing calculator to obtain a random sample of the 4 friends that she will bring
with her and give there names.
Exercise: To complete the Citizenship in the World merit badge, one must select 3 of the
following eight organizations and describe their role in the world. The list of digits below is
from a random number generator using technology.
7, 4, 4, 7, 3, 6, 2, 1, 9, 9, 5
United Nations, The World Court, The World Health Organization, CARE,
Amnesty International, The Red Cross, World Trade Organization, The World Bank
Exercise: A student completing an associates degree is required to select and two courses from
the following list of courses as part of the program.
Page 2
CRN 1280 Fieldwork Methods and Research
b. What is the chance that a student will pick the combination CRN 1040 and CRN 1060?
Exercise: Suppose you are the CEO of a coffee shop chain and you wish to conduct a survey to
determine the length of time customers are in line. Your administrative assistance provides
you will a list of the 674 coffee shops in your chain.
a. Discuss a procedure you could follow to obtain a simple random sample of 5 coffee shops
for your survey.
b. Obtain a sample using the following random number table: 86571 03875
17245 55042
07498 57343
73278 71956
16387 36139
02561 04736
84702 36139
Exercise: The owner of a private food store is concerned with employee moral. She decides to
survey the employees to learn aobut work environment and job satisfaction. Obtain a simple
random sample of size 4 from the below table using the random number table provided.
10, 37, 22, 46, 03, 15, 02, 18, 14, 05, 27
Page 3
Section 1.4—Other Effective Sampling Methods
Definitions:
______________________________: a simple random sample is drawn from each nonoverlapping
subgroup that the population has been separated into (“strata”)
Works best when: individuals in each stratum are similar in some way and we
want to make sure each stratum is type is represented; the size of
each simple random sample is proportional to the number of
individuals in the stratum from which it is selected
______________________________: a starting point is randomly selected (p) and then every kth
subject is included in the sample
N
k = n , round down to the nearest integer
the n subjects in the sample are numbered:
p, p + k, p + 2k, p + 3k, . . . , p + (n-1)k
______________________________: sampling subjects who are easy to get (bad sampling method)
Works best when: NEVER!! (not random)
Exercise: The human resources department at a certain company wants to conduct a survey
regarding worker moral. The department has a list of all 400 employees at the company and
wants to do a systematic sample of size 20. To do this they randomly selet the random number
8 person and then selects every 20th person. List the corresponding workers to be surveyed?
Page 5
Section 1.5—Bias in Sampling
Definitions:
______________________________: when sample results are not representative of the population
______________________________: bias because the technique used to select the sample favors
some individuals over others (not random)
o ______________________________: type of sampling bias where part of the population has
a lower chance or no chance of being in the sample (frame
is incomplete)
______________________________: bias because individuals in the sample do not respond
______________________________: bias because responses given are not accurate
Types of Questions:
I am going to leave it to you to read chapter 1.5. This chapter is 3 pages and is a must read
section about sources of bias in sampling.
Exercise: In a group, create some questions that will introduce bias from respondants.
Page 6
Section 2.1—Organizing Qualitative (Categorical) Data
Definitions:
A ___________________________________lists each category of data and the number of occurrences for each cagory.
A __________________________________ lists each category of data and the proportion (or percent) of
obervations in that category.
Relative frequency =
Day of the Week Talley Frequency Day of the Week Relative Frequency
Sunday Sunday
Monday Monday
Tuesday Tuesday
Wednesday Wednesday
Thursday Thursday
Friday Friday
Saturday Saturday
14 0.35
12 0.3
Frequency
10 0.25
8 0.2
6 0.15
4 0.1
2 0.05
0 0
Page 1
A ________________ is a circle divided into sectors. Each sector represents a
category of data and its area is proportional to the frequency.
e. If you own a restaurant, which days would you purchase an advertisement on the local
readio? Are there days that you should avoid?
Relative
Response
Frequency
Never
Rarely
Sometimes
Most of the
time
Always
d. Suppose a representative from the Center says, “52.7% of college students always wear a
seatbelt.” Is this a descriptive statistic or inverential statement? Why?
Page 2
e. Construct a relative frequency bar graph.
College Survey
0.6
Relative Frequency
0.5
0.4
0.3
0.2
0.1
0
Never Rarely Sometimes Most of the Always
time
Response
f. If a certain college has 12,728 students, how many would we expect to never wear a seat
belt?
a. If there 10 million cases of identity fraud in a recent year, how many were credit card
fraud?
b. If the commission claimed, “The results indicate that 17% of the faud commited in that
year was from Utilities fraud”, would you say this statement is descriptive or inferential?
Why?
Page 3
Exercise: Desirability Attributes – A random sample of 2163 adults (aged 18 and over) was
asked, “ Given a choice of the following, which one would you most want to be?” The results
of the survey are presented in the side-by-side bar graph.
b. Which age group has a majority of respondants who are less likely to buy when
made in America?
c. What is the apparent association between age and likelihood to buy when made in
America?
Page 4
Section 2.2—Organizing Quantitative Data
Definitions:
A ___________________is for qualitative data, while a __________________________is used for discrete
quantitative data or continuous quantitative data. Data is then grouped into _______________.
Notes on historgrams:
(1) Bars have equal widths
(2) Bars are touching
______________________________: smallest value in a class
______________________________: largest value in a class
______________________________: difference between consecutive lower class limits (or
consecutive upper class limits); NOT the difference between the lower
and upper class limits of a class
______________________________: a type of table where the first class has no lower limit and/or
the last class has no upper limit
Page 5
Identify the Shape of a Distrubution
One way that a quantitative variable is described is through the shape of its disbribution
Exercise: Cigarette Tax Rates – The table shows the tax, in dollars, on a pack of cigarettes in each
of the 50 states and Washington DC as of Januarary 2014.
c. Construct a relative frequency distribution with lower class limit 0 and a class width of 0.10.
d. Construct a relative frequency histogram
e. Does one frequency distribution provide a better summary of the data then the other? Explain
Page 6
Exercise: Predicting School Enrollment – To predict future
enrollment in a school district, fifty household within the district
were sampled, and asked to disclose the number of children under
the age of five living in the household. The results of the survey are
presented here:
Exercise: Cigarette Tax Rates – The table shows the tax, in dollars, on a pack of cigarettes in each
of the 50 states and Washington DC as of Januarary 2014.
f. Construct a relative frequency distribution with lower class limit 0 and a class width of 0.50.
g. Construct a relative frequency histogram
h. Construct a relative frequency distribution with lower class limit 0 and a class width of 0.10.
i. Construct a relative frequency histogram
j. Does one frequency distribution provide a better summary of the data then the other? Explain
A______________________________ is another graph for quantitative data created by placing a dot above
the value for each observation across a horizontal number line in increasing order.
Exercise: Wendy’s Arrival Times – The manager at Wendy’s fast-food restaurant wants to know the
typical number of customer who arrie during lunch during the lunch hour. The following data
represents the number of customers who arrive at Wendy’s for 40 randomaly selected 15-minute
intercals of time during lunch. Construct a dot plot and identify the distribution.
1 2 3 4 5 6 7 8 9 10 11
Number of Customers
Page 8
Section 2.3—Graphical Misrepresentations of Data (Bad Graphs)
1. Misrepresentation of Data (Circle the good graph; cross out the bad graph)
2. Manipulating the Vertical Scale (Circle the good graph; cross out the bad graph)
3. Inappropriate dimensions (Circle the good graph; cross out the bad graph)
What’s wrong with the
bad graph?
Page 1
Section 3.1—Measures of Central Tendency
Advantages/Disadvantages
________________________: the arithmetic average, computed by
adding up all the values and dividing by the number of
observations
1. Add up all the data values
2. Divide by the # of values
Population: μ = ___________________
Sample: x̅ = ___________________
Example: 3, 5, 2, 6, 3
Sum of all data values: ∑ xi = ___________
Number of data values: n = ___________
∑x
Divide: x̅ = n i = ___________ (3.8)
Advantages/Disadvantages
_________________________: the middle value, value that lies in the
middle of the data when arranged in ascending order
1. Sort the data, from least to greatest
2. Count from the ends to the middle of the list
3. If there is 1 value in the middle (n odd), that is the median
If there are 2 values in the middle (n even), average them
Example: 3, 4, 2, 6, 3
Ordered data values,
smallest to largest: ___ ___ ___ ___ ___
Middle value: ________ (3)
Example: 3, 4, 2, 6, 3, 7
Ordered data values,
smallest to largest: ___ ___ ___ ___ ___ ___
Average of 2
middle values: ________ (3.5)
Picture
Symmetric Data: Mean Median
Page 2
_________________________: most frequent observation occuring
1. Choose the value that occurs the most often
2. There may be multiple modes or no modes.
Example: 3, 4, 2, 6, 3 Which value is repeated the most often? _________
Example: 6, 3, 2, 6, 3 Which value is repeated the most often? _________
TI 83/84 or StatCrunch
Example: Random sample of car emissions (CO2 equivalents in tons per year)
7.2, 7.1, 7.4, 7.9, 6.5, 7.2, 8.2, 9.3
Example: Exams scores in a statistics class taught using traditioal lecture and a class taught
using a “flipped” classroom model were recorded. The “flipped” classroom is one where the
content is delivered via video and watched at home, while class time is used for activities and
exploration.
b. Suppose the score of 59.8 in the traditional course was incorrectly recorded as 598. How
would this affect the mean? The median? What property does this illustrate?
Example: The following data represents the pulst rates (beast per
minute) of nine students enrolled in a statistics course. Treat the 9
students as a population.
i)
ii)
iii)
Page 3
Example: Hours Working – A random sample of 25 college
students was asked, “ How many hours per week typically
do you work outside the home?” Their responses were as
follows.
b. Find the mean and median. Which measure of central tendency better describes the hours
worked?
class frequency
0.8 - 0.824
0.825 - 0.849
0.850 - 0.874
0.875 - 0.899
0.9 - 0.924
0.925 - 0.949
0.950 - 0.974
0.975 - 1.00
Example: The median for the given set of data values is 16. What is the missing value?
3 7 12 13 ____ 25 28 31
Example: In a class with 4 exams, a student has an average of 84 on the first 3 exams. What
must she score on the fourth exam to have an average of 86?
Page 4
Section 3.2—Measures of Dispersion
Compare the wait times for two restaurants. Notice that the histograms for both restaurants are
centered at 8 minutes, but Restaurant B is much more consistent (wait times are between about 6 and
10 minutes) than Restaurant A (wait times are between about 1 and 15 minutes). How consistent or
inconsistent measurements are is called dispersion (or spread).
Definition:
______________________________: a measure of how much the data are spread out (in general, we
prefer less dispersion)
Measures of Variation:
______________________________: difference between the largestand smallest data values.
Advantage / Disadvantage
Subtract: (largest data value) – (smallest data value)
Example: 3, 5, 2, 4, 1
______________________________: measures the typical deviation between a data value and the mean.
1. Find the mean
2. Find deviations from each data value to the mean
3. Square all deviations and add them up
Note: we square them since the sum of the deviations
without squaring is always 0.
4. Divide by N (population) or n–1 (sample)
5. Take the square root
∑(x−µ)2 ∑(x−x̅)2
Population: σ= √ Sample: s= √
N n−1
Definition:
______________________________: a measure of the amount of data available; the number
of observations that are free to be any value given the sum
of the observations = n – 1
The larger the standard deviation, the __________________ dispersion the distribution has.
Page 1
______________________________: the square of the standard deviation.
Definitions:
______________________________: when a statistic consistently underestimates or overestimate
a parameter
b. Compute the population standard deviation using the TI Calculator and StatCrunch.
c. Compute the sample variance and population variance of the exam scores.
Exercise: Consider the emissions from 8 cars from a rental fleet. Compute the following using
the TI 83/84 and StatCrunch
Car emissions sample: 7.2, 7.1, 7.4, 7.9, 6.5, 7.2, 8.2, 9.3
Range:
Standard Deviation:
Variance:
What if these were the car emissions for all the cars in a particular fleet?
Range:
Standard Deviation:
Variance:
Page 2
Exercise: Do example 7 from homework.
Page 3
Exercise: The following data represents the weights
(in grams) of a random sample of 50 M&M plain
candies.
b. On the basis of the histogram, comment on the appropriateness ofusing the Emperical Rule
to make any general statements about the weights of M&Ms.
d. Use the Emperical Rule to determine the percentage of M&Ms with weights between 0.803
and 0.947 grams, inclusive.
e. Use the Emperical Rule to determine the percentage of M&Ms that weights more then 0.911
grams.
f. Determint the actual percentage of M&Ms that weigh more then 0.911 grams.
Exercise: SAT Math scores have a bell-shaped distribution with a mean of 515 and a standard
deviation of 114.
Page 4
Section 3.3—Measures of Central Tendency & Dispersion from Grouped Data
Definition:
______________________________: data is given summarized in a frequency table, rather than the
raw data values for each observation. Since we only know counts
of observations that fall within certain categories, we can’t
compute the mean or standard deviation using the formulas from
Sections 3.1 and 3.2.
1st class midpoint × 1st class frequency+⋯+ last class midpoint × last class frequency ∑ xi fi
Instead, μ≈ ≈ x̅ ≈ ∑ fi
total frequency
lower limit + next lower limit
xi is the _____________________ of the ith class = 2
Exercise: The five-year rate of return of a Exercise: Recently a random dample of 25-34
random sample of 40 large-blended mutual year olds was asked, “How much do you
funds is given. Approximate the mean and currently have in savings, not including
standard deviation of the five-year rate of retirement savings?” Approximate the mean
return using the TI-calculator. and standard deviation amount of savings
using the TI-calculator.
Page 1
Exercise: The following data represents the number of people aged 25 to 64 years covered by
health insurance (private or government) in 2003. Approximate the mean and standard
deviation for age using the TI-calculator.
b. Draw a frequency histogram of the data to verify that the distribution is bell shaped.
Thermostat Temperature
200
150
frequency
100
50
0
55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85
Temperature (in degrees)
c. On the basis of the histogram, comment on the appropriateness of using the empirical rule
to make any general statements about the temperature data.
d. According to the Empirical Rule, 95% of days in the month will be between what two
temperatures?
Page 2
Section 3.4—Measures of Position and Outliers
Definitions:
__________________________: the number of standard deviations that a data value is from the mean
Formulas: z= z=
You scored higher on the ____________________ test relative to the rest of the students because it was
further above the mean.
Example: Roberto finishes a triathlon (750-meter swim, 5-kilometer run, and 20-kilometer bicycle)
in 63.2 minutes. Among all men in the race, the mean finishing time was 69.4 minutes with a
standard deviation of 8.0 minutes. Zandra finishes the same triathlon in 79.3 minutes. Among all
woman in the race, the mean finishing time was 84.7 minutes with a standard deviation of 7.4
minutes. Who did better in relationship to their gender?
Page 1
_______________________: Pk, the data value where k% of the observations are less thanor equal to it.
Example: The 85th percentile of the IQ scores of honor roll graduates of a certain college is 124. What
does this mean?
_______% of the college’s honor roll graduates have an IQ of 124 or _____________, and
_______% of the college’s honor roll graduates have an IQ _____________ than 124.
Example: The 80th percentile of the distance students of a certain college is 12.4 milesd. What does
this mean?
_______% of the students drive 12.4 or _____________, and
_______% of the students drive drive _____________ than 12.4
______________________________: extreme observations, values far from the bulk of the data
Determining outliers (Note: there are other methods for determining outliers as well):
Lower fence = _________ – 1.5(_________)
Upper fence = _________ + 1.5(_________)
Values outside the fences (less than the ______________________________ or greater than the
______________________________ are considered outliers.
Another way to determine outliers: |z-score| > 2 means the value is unusual (outlier)
Page 2
Example: One variable that is measured by online homework systems is the amount of time a student
spends on homework for each section of text. The following is a summary of the number of
minutes a student spends for each section for last semester.
c. Suppose a students spent 2 hours doing homework for a section. Is this an outlier?
d. Do you believe that the distribution of time spent doing homework is sqewed or
symmetric? Why?
Example: Hemoglobin in cats: 5.7 6.1 7.8 8.8 9.4 9.4 9.6 9.9
10.0 10.3 10.6 10.7 11.5 11.7 12.9 14.3
Q1 = Q2 = Q3 =
Skewed?
IQR =
Lower fence =
Upper fence =
Outliers?
Page 3
Example: A credit card company has a fraud-detection
service that determines if a card has any unusual
activity. The company maintains a database of daily
charges on a customer’s credit card. If a day’s woth
of charges appears unusual, the customer is
contacted to make sure that the credit card has not
been compromised. The company uses the upper
fence as the cutoff point for the daily charges that
must be exceeded before the customer is contacted.
What is the cutoff point?
________________________________________: consists of the smallest data value, Q1, median, Q3, and
the largest data value
______________________________: lines extending from the box to the smallest and largest (non-
outlier) values
Skewness in Boxplots: Right-skewed: Median left of box’s center, right whisker longer
Note: outliers are *
marked with an *
Symmetric: Median roughly at box’s center, whiskers equal length
Page 4
Example: Below are two boxplots for the ACT score for Incoming Freshmen at two colleges. Use the
boxplots to compare the two colleges.
24 15 29 30 26
Example: Consider the number of items produced per hour 28 20 27 23 30
produced at a factory: 22 7 26 28 21
Fences:
Page 5
Example: Do store-brand chocolate chip cookies have
fewer chips per cookie than Keebler’s Chips Deluxe
Chocolate Chip Cookies? To find out, a student
randomly selected 13 cookies of each brand and
counted the number of chips in the cookies.
Name Brand: 22 22 23 23 24 25 26 28 28 29 31 32 35
Store Brand: 15 17 19 21 22 23 24 24 26 27 28 28 33
a. Determine the 5-number summary for each brand of cookies using the TI-84 or by hand
Name Brand:
Store Brand:
Name Brand:
Store Brand:
Page 6