Академический Документы
Профессиональный Документы
Культура Документы
1. Other than the examples on the lecture notes, give one example for each of the 4 types of
data (ratio, interval, ordinal, and nominal), with brief explanation.
Ratio: Population in a city. (0 means no people, population of a city can be compared with
another city by subtraction or division)
Interval: Time instant on a clock. (0 doesn’t mean no time, 15 o’clock divided by 2
o’clock is meaningless)
Ordinal: Address of buildings on the same street. (The address indicates their order, but
not necessarily with the same spacing between them)
Nominal: Name of different animals. (Just used for differentiating animals, no indication
of good or bad, or which one is how much better than another)
2. Can Mode, Mean, Median, etc. applied to all 4 types of data? (e.g., is there a mean for
nominal data?) Tick in the following table, and explain briefly.
Ratio Interval Ordinal Nominal
Yes (for Yes (for
continuous, not continuous,
Mode Yes Yes
always have a not always
mode) have a mode)
Mean Yes Yes No No
Median Yes Yes Yes No
Max Yes Yes Yes No
Min Yes Yes Yes No
Range (max-min) Yes Yes No No
Yes (but
Sum Yes No No
meaningless)
Standard deviation Yes Yes No No
Count (number of observation) Yes Yes Yes Yes
3. Toss a dice 5 times. We want to know the probability of: sum of first two tosses is greater
than the sum of last two tosses.
(a) In this story, what are the random experiment, random variables, and event(s)?
On Slide 39 of Lecture 1, there is a list of the probability of the sum of two tosses.
For X > Y, we consider the following possibilities:
a=[1 2 3 4 5 6 5 4 3 2 1]/36;
S=sum(a);
N=numel(a);
ps=0;
for i = 2:N
P=a(i)*sum(a(1:i-1));
ps=ps+P;
end
% ps is the answer
ps=(1-sum(a.^2))/2;
(c) Toss a dice 5 times, consider the following events (i) first toss is 4; (ii) second toss is 6;
(iii) third toss is 3; (iv) fourth toss is greater than the sum of the third and fifty tosses.
Consider all pairs of events, are any of two dependent/independent, exclusive/non-
exclusive? Fill in the table below and briefly explain.
So the probability of winning, under the condition that B is known to be less than 5:
A’s chance of winning is 95.8% (the information that B is less than 5 makes A more
confident to win the gambling).
4. If Z follows a standard normal distribution, using the graph on Slide 53 of Lecture 1, can
you calculate the probability of the following events: –2<Z<1, Z<1, Z>2, Z=0?
P(–2<Z<1) = the area under the curve for the interval [-2,1] = 34.13*2+13.59 = 81.85
P(Z<1) = the area under the curve to the left of 1 = 84.1% (look at the “percentile” line)
P(Z>2) = the area under the curve to the right of 2 = 1 – P(Z<2) = 1 – 97.7% = 2.3%
5. A new mayor was elected to govern a city from the year 2012, and thus the economic
policy was changed from Policy 1 to Policy 2. The following table shows the GDP growth
and policy applied over 15 years.
(a) What are the variables in the above data, and what are their types (ratio, interval, etc.)?
Year: Interval (0 doesn’t mean nothing, no meaning for ratio between years)
DP growth: Ratio (0 means no growth, ratio means how many times more increase)
Policy applied: Nominal (just indicate 2 different policies, although using number, no
quantitative meaning)
(b) What are the average and standard deviation of GDP growth for two types of policies,
respectively?
First, re-organize the data in Excel as follows (policy 1 in a column, policy 2 in another)
You may use “average” and “stdev” to find the answers, or use “Data Analysis” tool, and
choose descriptive statistics (options shown above).
(c) Use hypothesis testing to find whether Policy 2 is better than Policy 1.
From the inspection, we found the GDP growth is significantly faster for Policy 2 compared
with Policy 1. So we write down the hypotheses below:
H0: μ2 –μ1 = 0; H1: μ2 –μ1 > 0 (one tail test will be done, to prove Policy 2 is “better”)
Since the sample sizes are small (9 and 6 counts for each policy, respectively), and the
population variances are unknown. We should use two-sample T-test. As the standard
deviation of two policies are quite close, it doesn’t matter to use equal or unequal variances
assumptions.
P value is 1.3e-5, very small, indicating that H0 should be rejected, and thus there is strong
statistical evidence to support that Policy 2 makes GDP grow faster than before.