Вы находитесь на странице: 1из 5

CN3421 Statistics Tutorial 1 (Week 3)

1. Other than the examples on the lecture notes, give one example for each of the 4 types of
data (ratio, interval, ordinal, and nominal), with brief explanation.

Ratio: Population in a city. (0 means no people, population of a city can be compared with
another city by subtraction or division)
Interval: Time instant on a clock. (0 doesn’t mean no time, 15 o’clock divided by 2
o’clock is meaningless)
Ordinal: Address of buildings on the same street. (The address indicates their order, but
not necessarily with the same spacing between them)
Nominal: Name of different animals. (Just used for differentiating animals, no indication
of good or bad, or which one is how much better than another)

2. Can Mode, Mean, Median, etc. applied to all 4 types of data? (e.g., is there a mean for
nominal data?) Tick in the following table, and explain briefly.
Ratio Interval Ordinal Nominal
Yes (for Yes (for
continuous, not continuous,
Mode Yes Yes
always have a not always
mode) have a mode)
Mean Yes Yes No No
Median Yes Yes Yes No
Max Yes Yes Yes No
Min Yes Yes Yes No
Range (max-min) Yes Yes No No
Yes (but
Sum Yes No No
meaningless)
Standard deviation Yes Yes No No
Count (number of observation) Yes Yes Yes Yes

3. Toss a dice 5 times. We want to know the probability of: sum of first two tosses is greater
than the sum of last two tosses.

(a) In this story, what are the random experiment, random variables, and event(s)?

Random experiment: Toss a dice 5 times (not any one time)


Random variables: X = Sum of first two tosses, Y = Sum of last two tosses
Event: X>Y
(b) Assume the dice is well balanced with 1-6 on each face, calculate the probability stated
in the question.

The answer is 0.4437.

On Slide 39 of Lecture 1, there is a list of the probability of the sum of two tosses.
For X > Y, we consider the following possibilities:

X=3, Y=2, P = 2/36 * 1/36;


X=4, Y=2 or 3, P = 3/36 * (1/36 + 2/36)
X=5, Y=2, 3 or 4, P = 4/36 * (1/36 + 2/36 + 3/36)

X=12, Y=2,3,…,11, P = 1/36 * (1/36 + 2/36 + … + 2/36)

MATLAB code to calculate:

a=[1 2 3 4 5 6 5 4 3 2 1]/36;
S=sum(a);
N=numel(a);
ps=0;
for i = 2:N
P=a(i)*sum(a(1:i-1));
ps=ps+P;
end

% ps is the answer

% alternatively, you can calculate the probability of X=Y, then the


remaining probability is X not equals Y, half will be X > Y. So use the
following single line to calculate and obtain the same answer.

ps=(1-sum(a.^2))/2;

(c) Toss a dice 5 times, consider the following events (i) first toss is 4; (ii) second toss is 6;
(iii) third toss is 3; (iv) fourth toss is greater than the sum of the third and fifty tosses.
Consider all pairs of events, are any of two dependent/independent, exclusive/non-
exclusive? Fill in the table below and briefly explain.

Dependent or independent Exclusive or non-exclusive


(i) and (ii) Independent Non-exclusive
(i) and (iii) Independent Non-exclusive
(i) and (iv) Independent Non-exclusive
(ii) and (iii) Independent Non-exclusive
(ii) and (iv) Independent Non-exclusive
(iii) and (iv) Dependent Non-exclusive

Comments: Independent means not affect. Exclusive means impossible to happen


simultaneously (e.g., second toss is 6, and first toss is greater than the second).
(d) Now there is a gambling between A and B, each of whom tosses two dices
simultaneously, and the one whose sum of two is bigger will be the winner. If experience
shows that (1) for 30 times where A won, B got an outcome of less than 5 (inclusive) for
18 times; (2) the probability of B got a sum of less than 5 (inclusive) can be calculated.
Now, B tosses first, and show that his outcome is less than 5 (2+2 or 1+4). Can A estimate
his probability of winning?

Use Baye’s Theorem:


Event a: A wins, Event b: B is less than 5.
For experience (1), that means P(b|a) = 18/30 = 3/5.
For experience (2), P(b) = 1/36 + 2/36 + 3/36 +4/36 (Slide 39 of Lecture 1) = 10/36 = 5/18
P(a) is obtained in part (b), i.e., if you know nothing, the probability of winning is 0.4437.

So the probability of winning, under the condition that B is known to be less than 5:

P(a|b) = P(a) * P(b/a)/P(b) = 0.4437 * (3/5) / (5/18) = 0.958

A’s chance of winning is 95.8% (the information that B is less than 5 makes A more
confident to win the gambling).

4. If Z follows a standard normal distribution, using the graph on Slide 53 of Lecture 1, can
you calculate the probability of the following events: –2<Z<1, Z<1, Z>2, Z=0?

P(–2<Z<1) = the area under the curve for the interval [-2,1] = 34.13*2+13.59 = 81.85
P(Z<1) = the area under the curve to the left of 1 = 84.1% (look at the “percentile” line)
P(Z>2) = the area under the curve to the right of 2 = 1 – P(Z<2) = 1 – 97.7% = 2.3%

5. A new mayor was elected to govern a city from the year 2012, and thus the economic
policy was changed from Policy 1 to Policy 2. The following table shows the GDP growth
and policy applied over 15 years.

Year GDP growth Policy applied


2003 5.10% 1
2004 4.10% 1
2005 4.50% 1
2006 4.70% 1
2007 4.60% 1
2008 5.20% 1
2009 5.10% 1
2010 4.40% 1
2011 4.20% 1
2012 5.50% 2
2013 5.60% 2
2014 5.90% 2
2015 6.10% 2
2016 6.20% 2
2017 6.40% 2

(a) What are the variables in the above data, and what are their types (ratio, interval, etc.)?
Year: Interval (0 doesn’t mean nothing, no meaning for ratio between years)
DP growth: Ratio (0 means no growth, ratio means how many times more increase)
Policy applied: Nominal (just indicate 2 different policies, although using number, no
quantitative meaning)

(b) What are the average and standard deviation of GDP growth for two types of policies,
respectively?
First, re-organize the data in Excel as follows (policy 1 in a column, policy 2 in another)

You may use “average” and “stdev” to find the answers, or use “Data Analysis” tool, and
choose descriptive statistics (options shown above).

Policy 1: Avg: 4.66%, Stdev: 0.001345


Policy 2: Avg: 5.95%, Stdev: 0.001432

(c) Use hypothesis testing to find whether Policy 2 is better than Policy 1.
From the inspection, we found the GDP growth is significantly faster for Policy 2 compared
with Policy 1. So we write down the hypotheses below:
H0: μ2 –μ1 = 0; H1: μ2 –μ1 > 0 (one tail test will be done, to prove Policy 2 is “better”)
Since the sample sizes are small (9 and 6 counts for each policy, respectively), and the
population variances are unknown. We should use two-sample T-test. As the standard
deviation of two policies are quite close, it doesn’t matter to use equal or unequal variances
assumptions.

Click “Data Analysis”, choose “t-Test: Two-Sample Assuming Unequal(or Equal)


Variances”, then set input and options as shown below:

Testing results shown below:

P value is 1.3e-5, very small, indicating that H0 should be rejected, and thus there is strong
statistical evidence to support that Policy 2 makes GDP grow faster than before.

Вам также может понравиться