Вы находитесь на странице: 1из 13

Session 2 - Spring 2019 - Economics Main Exam

SEAT NUMBER:

STUDENT NUMBER:

SURNAME:
(FAMILY NAME)

OTHER NAMES:

This paper and all materials issued must be returned at the end of the examination.
They are not to be removed from the exam centre.

Examination Conditions:
2613 Business Statistics
It is your responsibility to fill out and complete your details
in the space provided on all the examination material 4
provided to you. Use the time before your examination to
do so as you will not be allowed any extra time once the
exam has ended. Time Allowed: 120 minutes.
You are not permitted to have on your desk or on your
person any unauthorised material. This includes but not
Reading time: 10 minutes.
limited to:
 Mobile phones Reading time is for reading only. You are not permitted to write, calculate or mark your
 Smart watches and bands paper in any way during reading time.
 Electronic devices
 Draft paper (unless provided)
 Textbooks (unless specified)
 Notes (unless specified) Open Book

You are not permitted to obtain assistance by improper


means or ask for help from or give help to any other Permitted materials for this exam:
person.
 Calculators (including programmable)
If you wish to leave and be re-admitted (including to use  Any annotated textbooks
the toilet), you have to wait until 90 mins has elapsed.
 Any paper materials that are handwritten, photocopied or typed
If you wish to leave the exam room permanently, you
including lecture notes
have to wait until 60 mins has elapsed.
Materials provided for this exam:
You are not permitted to leave your seat (including to use
the toilet) during the final 15 mins.  This examination paper

During the examination you must first seek permission


(by raising your hand) from a supervisor before:
 Leaving early
Students please note:
 Using the toilet
 Accessing your bag
• This question paper MUST NOT be removed from the Examination
Centre.
Misconduct action will be taken against you if you breach • There are legal consequences if you take the question paper with you
university rules. when you leave.
• Calculators including programmable calculators are allowed.
Declaration: I declare that I have read the advice above • You must record your student name and number carefully on this exam
on examination conduct and listened to the examination
supervisor’s instructions for this exam. In addition, I am paper.
aware of the university’s rules regarding misconduct • You should answer all questions on this exam paper.
during examinations. I am not in possession of, nor do I • You must use clear handwriting. If we cannot read what you have
have access to, any unauthorised material during this
examination. I agree to be bound by the university’s rules, written, your answer will be treated as non-response and given zero
codes of conduct, and other policies relating to marks.
examinations. • You can use your own draft papers for any working. This exam paper
will be collected and must not be removed from the Examination Centre.

Good luck and wish you all success!


Signature:

Date:

Do not your exam paper until instructed.


Page 1 of 13
26134 - Business Statistics – Session 2 – Spring – Main, 2019

Page 2 of 13
26134 - Business Statistics – Session 2 – Main, Spring 2019

Page 3 of 13
26134 - Business Statistics – Session 2 – Spring – Main, 2019
Question 1. For numerical data, what is the difference between the measurement scale of interval and the
measurement scale of ratio? Give an example of numerical data whose measurement scale is ratio.

Interval: 0 does not have natural meaning (difference has meaning, ratio has no meaning)
Ratio: 0 has natural meaning (both difference and ratio have meaning)

Interval: temperature
Ratio: salary

Question 2. A toy company is conducting a market research to see the number of children a household
has, aiming at improving its marketing strategy. Based a random sample of households, the frequency
distribution below is constructed. Compute the expected number of children a household has.

8 Frequency
6
4
2
0
0 1 2 3 4

0 × 3+1× 4 +2× 6+3 ×2+ 4 × 4


E ( X )= =2
3+ 4+6+ 2+ 4
Note that correct formula can be formulated differently.

Question 3. Troye Sivan is an Australian singer and songwriter. He is going to give an Australian tour in
2020. 40 VIP tickets will be given to fans in Sydney for free in a lottery. These tickets are sponsored by

Page 4 of 13
26134 - Business Statistics – Session 2 – Main, Spring 2019
Virgin Australia (15 tickets), Toyota Australia (10 tickets), NSW government (10 tickets) and Warner
Bros. Australia (5 tickets). David Tran is a big and lucky fan of Troye and he won four tickets. If we
would like compute the probability that David has exactly three tickets sponsored by Toyota, can we use
the binomial distribution. Explain why / why not.

We cannot not use binomial distribution. Trials are not independent.

Question 4. There are two stocks an investor can invest in. The expected rate of return is the same for
both Stock A and Stock B. Without using any maths, explain why for a risk-averse investor she would
like to choose a stock whose returns have a smaller variance.

Variance in this example measures variability/fluctuation/volatility/uncertainty/risk of stock returns,


namely the likelihood of a stock taking extreme returns, both positive and negative. A risk-averse investor
does not like uncertainty, thus prefers a stock with a smaller variance.

It is of interest to know the dependence structure between income level (Low, Mid, High) of parents and
the education attainment (Primary, Secondary, Tertiary) of their oldest child. Based on the following
incomplete contingency table, answer Question 5 and Question 6.

Primary school Secondary school Tertiary school Total


Low income 34 12 3 49
Mid income 15 18 17 50
High income 2 8 32 42
Total 51 38 52 141

Question 5. Suppose the income level of parents is high. What is the probability that their oldest child has
tertiary education?

P ( tertiary ∩high ) 32/141


P ( tertiary|high )= = =0.7619
P ( high ) 42/141

Question 6. The χ 2 statistic used to test whether or not parents’ income level and the education attainment
of their oldest child are independent equals 37.8263. At the 5 % level, what is the critical value and what
is the test result?

χ 2 ∼ χ 2 (v ) with v=( 3−1 ) × ( 3−1 )=4.

Page 5 of 13
26134 - Business Statistics – Session 2 – Spring – Main, 2019
The critical value is χ 20.05,4 =9.488. Because 37.8263>9.488, there is enough statistical evidence to reject
the null hypothesis that income level and education attainment of parents’ oldest child are independent;
thus we accept the alternative that they are dependent.

Question 7. Explain the following property of a random variable that follows the Poisson distribution:
“Poisson arrivals see exponential waiting time”.

If the number of arrivals within a given period of time follows a Poisson distribution, then the inter-
arrival waiting time follows an exponential distribution.

Question 8. It is known that the number of vehicles passing through a roundabout follows a Poisson
distribution. On average during a single peak hour, 60 vehicles pass through the roundabout; whereas on
average during an off-peak hour, number of vehicles that pass through the roundabout halves. What is the
probability that during an off-peak hour, after one vehicle passes the roundabout one needs to wait
between one minute and three minutes to see the next vehicle?

Remember: If the number of arrivals within a given period of time follows a Poisson distribution, then the
inter-arrival waiting time follows an exponential distribution. The cumulative density function for an
exponential is: 1−e−λ× where x is the time till next arrival.

During an off-peak hour there are 30 vehicles on average, so per minute there are 0.5 vehicles on average.
This means λ=0.5. Let X denote the inter-arrival waiting time.
P ( 1< X <3 ; λ=0.5 )=( 1−e−0.5 × 3 )−( 1−e−0.5 ×1 )=0.3834

Question 9. The temperature in New South Wales (NSW) in September is normally distributed with an
unknown mean temperature and an unknown standard deviation. A random sample of 16 days in

Page 6 of 13
26134 - Business Statistics – Session 2 – Main, Spring 2019
September in NSW history gives a mean temperature of 17oC and a standard deviation of 9oC. Construct
the 90% confidence interval for the population mean temperature in September in NSW.

X́ =17, s=9, n=16. Given n and α =1−0.9=0.1, it follows t 0.05,15=1.753.


9
17 ± 1.753× ⇒(13.0557 , 20.9443)
√ 16

Question 10. Suppose in the above question the population standard deviation is known to be 9 oC. For the
90% confidence interval for the mean temperature, without using maths can you explain if it would be
wider or tighter than the one obtained in Question 9? Why?

It is going to be tighter. Because now σ , the population standard deviation is known, we do not need to
proxy it with sample standard deviation. In Question 9, the unknown standard deviation leads to the use
of sample standard deviation, which is a random variable itself, thus adding an extra source of
uncertainty. Without this extra piece of uncertainty, the confidence interval becomes tighter, meaning that
we are more certain about the location/value of population mean.

Question 11. A simple party game uses a fair die with 6 faces. The game is won if a six is rolled. What is
the probability that out of 5 games, someone wins two or four times?

1
(
X ∼ Bin n=5 , p=
6 )
2 3 4 1
1 5 1 5
P ( X=2 )+ P ( X =4 )=( 5 ) ( ) ( ) + ( 5 ) ( ) ( ) =0.1608+0.0032=0.164
2 6 6 4 6 6

Based on the following material, answer Question 12 to Question 14.


If a house is installed with solar panels, the owner can sell excess electricity generated by the solar panels
to the local power grid for a profit. As a result, the daily electricity cost for the owner of the house can be

Page 7 of 13
26134 - Business Statistics – Session 2 – Spring – Main, 2019
negative (the owner gets paid by the energy company more than the owner pays the company). Suppose
the daily electricity cost (in the unit of dollars) for the owner is normally distributed with mean zero and
standard deviation 24. With a random sample of 18 days, you would like to study the sampling
distribution of sample variance.

Question 12. What distribution should you use? The sample variance does not follow this sampling
distribution directly, but through a transformation. What is this transformation? What is the parameter
that characterises this distribution?
( n−1 ) s 2
2
We should use the χ distribution. 2
∼ χ 2 (n−1).
σ

Question 13. The random sample of 18 days gives a sample standard deviation 27. Test if the population
standard deviation is indeed 24. Assume the significance level α =0.05.

( n−1 ) s 2 17 × 272
2
= 2
=21.5156. χ 20.975,17=7.564 and χ 20.025,17=30.191. Fail to reject, and maintain the null.
σ 24

Question 14. The same random sample of 18 days also gives a sample mean of 3. If you fail to reject the
test in Question 13, you assume the population standard deviation to be 24; otherwise you assume it to be
unknown. Test if the population mean is indeed zero. Assume the significance level α =0.05. If you
cannot answer Question 13, assume the population standard deviation is unknown so you use the sample
standard deviation 27.

3−0
σ =24, n=18,   X́ =3, Z= =0.5303 which is between −1.96 and 1.96. So fail to reject, and
24 / √ 18
maintain the null.

Based on the following information, answer Question 15 and Question 16.


Pay equality, or equal pay for equal work refers to the requirement that men and women be paid the same
if performing the same job in the same organisation. A research conducted at an Australian company

Page 8 of 13
26134 - Business Statistics – Session 2 – Main, Spring 2019
looks into the pay gap between male and female employees who perform the same job. The table below
summarises some information of a random sample of hourly wages of male and female employees.

Sample size Sample mean Sample standard deviation


Male 10 $25 per hour $2 per hour
Female 16 $22 per hour $1.4 per hour

Question 15. At 5% level, test if the population variances of hourly pay for men and women are the same.
You are given the following critical values: F 0.05,10,15=2.544 , F 0.025,9,15=3.123, F 0.05,9,15=2.588,
F 0.025,10,16 =2.986, F 0.05,15,9=3.006 , F 0.025,15,9=3.769, F 0.05,16,9=2.989, F 0.025,16,10 =3.496.

Denote the male payment as X and female payment as Y . (Students may swap this, but the test result
should be unchanged w.r.t to the answer provided here)
22
F= 2 =2.041∼ F ( v 1=9 , v 2=15 )
1.4
1 1
The right critical value is F 0.025,9,15=3.123 and the left critical value is F 0.975,9,15= = =0.265.
F 0.025,15,9 3.769
So fail to reject, and maintain the null.

Question 16. Based on the above test result, at 5 % level test if the mean hourly pay for men and women
is the same. If you cannot answer Question 15, assume variances are unequal. If the two population
variances are not equal, the t statistic has degrees of freedom vu =14.

(25−22)
tv = =4.508
e
2 2
2 10−1 +1.4 (16−1) 1 1
√ ( )
10+16−2 √
+
10 16
t v ∼t (v e ) with v e=10+16−2=24 . So the critical values are ± t 0.025,24 =±2.064 . Reject the null and accept
e

the alternative.

Question 17. The following table shows the body weights of 5 cancer patients before and after a series of
radiotherapy. If you would like to test if radiotherapy changes the weight of cancer patients, why can’t
you use the independent sample t test? What is the test you should use?

Patient A Patient B Patient C Patient D Patient E

Page 9 of 13
26134 - Business Statistics – Session 2 – Spring – Main, 2019
Before 77kg 58kg 86kg 71kg 68kg
After 65kg 55kg 81kg 73kg 59kg

Because the samples are correlated; for example, patient A shows up in both the sample collected before
the treatment and after. We should use paired sample t test.

To study the factors that affect hourly wages, a random sample of 232 working individuals is taken.
Researchers ran a regression of individual’s hourly wage on their education level (in years), past work
experience (in years) and how much time it takes them to travel to work (in minutes). Based on the
following regression result, answer Question 18 to Question 21.

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.357724854
R Square 0.127967071
Adjusted R Square 0.116492953
Standard Error 4.952062977
Observations 232

ANOVA
df SS MS F Significance F
Regression 3 820.4885212 273.4961737 11.15267218 7.39556E-07
Residual 228 5591.227521 24.52292772
Total 231 6411.716042

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 16.93522707 1.979229622 8.556474136 1.72533E-15 13.03530712 20.83514703
Years of schooling 0.533673278 0.10035563 5.317821026 2.50196E-07 0.335930217 0.731416338
Past work experience 0.369050251 0.162859111 2.266070656 0.024385551 0.048148883 0.68995162
Time to commute 0.013428953 0.010139441 1.324427389 0.186687196 -0.006550036 0.033407942

Question 18. Interpret coefficients which are significant at the 1% level.

The intercept and coefficient on years of schooling are significant at 1% level.


Intercept: For a person without education or past working experience and spends zero minutes
commuting, his or her hourly wages are expected to be 16.94 dollars.
Coefficient on years of schooling: Keeping all other variables (work experience and time to commute)
constant, one extra year of education is expected to increase hourly wages by 0.53 dollars.

Question 19. Write down the estimated model and interpret the R square.

y i= β^ 0 + ^β 1 x 1 ,i + ^β 2 x 2 ,i + β^ 3 x3 , i+ ei
Where ^β 0=16.94, ^β 1=0.53, ^β 2=0.37, and ^β 3=0.01

Page 10 of 13
26134 - Business Statistics – Session 2 – Main, Spring 2019
The above suffices full 2 points. Replacing x’s with variables names and   ^β ’s with estimated values are
correct. If a student miss e i, but write   ^y i, this is also correct. But missing e i or writing down ϵ i while
keeping y i leads to 0.5 deduction.

Question 20. What is the null and alternative hypothesis behind the ANOVA test (the table in the
middle)? At the 10% level, what is the conclusion of the test?

H 0 : β1 =β2 =β3 =0, H a : at least one β is not equal to zero.


Reject the null, and accept the alternative.

Question 21. The scatter plot below shows the bivariate relationship between hourly wages and years of
schooling. Do you think the regression model is sound? Why / why not?

No, because years of schooling may have diminishing effect on hourly wages.

The revenue of a public company in the fashion industry is thought to be driven by its investment in
advertisements. Its marketing department would like to conduct research that looks into quantitative
effect that advertisement cost has on the revenue. The department has obtained data and run the following
regression:
y i=β 0 + β 1 x 1 ,i + β 2 x 2 ,i + β 3 x3 , i+ β 4 x 4 ,i +ϵ i ,
where
y i is the natural logarithm of revenue “ln(revenue)” in month i;

Page 11 of 13
26134 - Business Statistics – Session 2 – Spring – Main, 2019
x 1 ,i is the natural logarithm of advertisement cost “ln(Ads cost)” in month i;
x 2 ,i is the average share price of the company “Share price” in month i (in dollars);
x 3 ,i is the product of the natural logarithm of advertisement cost in month i and the average share price of
the company in month i, i.e. x 3 ,i=x 1 ,i × x 2 ,i;
x 4 , i is a dummy variable “Social media ads” that equals 1 if in month i the company utilises social media
advertising and 0 otherwise;
ϵ i is a monthly error term that follows a normal distribution.

Based on the regression result given below, answer Question 22 to Question 25.

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.960468877
R Square 0.922500464
Adjusted R Square 0.921736921
Standard Error 3.651409608
Observations 411

ANOVA
df SS MS F Significance F
Regression 4 64433.93127 16108.48282 1208.185 6.3459E-224
Residual 406 5413.113602 13.33279212
Total 410 69847.04487

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 1.006847039 3.1934502 0.315285029 0.752707 -5.270914577 7.284608654
ln(Ads cost) 1.635776564 0.292653818 5.589459167 4.19E-08 1.060470613 2.211082515
Share price -0.310750702 0.23638227 -1.314610868 0.189383 -0.775436684 0.153935281
ln(Ads cost)*Share price 0.175158859 0.021632489 8.09702754 6.64E-15 0.132633189 0.217684529
Social media ads 0.829517172 0.362752596 2.286729802 0.022726 0.116409348 1.542624996

Question 22. Interpret the coefficient on “Social media ads”.

Keeping all other variables constant, using social media advertising is expected to increase monthly
revenue by 82.95%.

Question 23. Predict revenue if the company uses social media and the total advertisement cost amounts
to $6,000 and the average monthly share price is $3.
The regression line is given by
ln ( revenue )=1.01+1.64 × ln ( ad cost )−0.31× share price+0.18 × ln ( ad cost ) × share price+ 0.83 ×social media
The question did not specify the unit for advertisement cost, so both plugging 6000 and plugging 6 (in
some lecture examples, advertisement cost is denominated in $K, so students may think here is the same)
into ad cost are correct.

Page 12 of 13
26134 - Business Statistics – Session 2 – Main, Spring 2019
ln ( revenue )=1.01+1.64 × ln ( 6000 )−0.31 ×3+0.18 × ln ( 6000 ) × 3+0.83 ×1=19.87, this means revenue is
expected to be exp ( 19.87 ) =4.26 ×10 8 .

Plugging in 6 instead of 6000 is also treated as correct:


ln ( revenue )=1.01+1.64 × ln ( 6 )−0.31 ×3+ 0.18× ln ( 6 ) ×3+0.83 ×1=4.82, this means revenue is expected
to be exp ( 4.82 )=123.97 .

Question 24. All other factors held constant, at the share price of $7, what effect does a 2% increase of
advertisement cost have on revenue?

2 % increase in ads cost is expected to increase revenue by [ 2 × ( 1.636+7 × 0.175 ) ] %=5.722 %.

Question 25. Being a perfectionist, the CEO argues that the R square of this regression is not high
enough, so the marketing department should add more variables. A potential variable to add is the
dividend per share paid to shareholders, which equals a fixed percentage of the monthly share price plus
some random noise. After adding this variable, the new table of regression summary is obtained and
given below.

Regression Statistics
Multiple R 0.962155365
R Square 0.92755463
Adjusted R Square 0.910346574
Standard Error 3.785200146
Observations 411

Explain to the CEO why adding this variable is not a good idea.

Adding this variable leads to decreased adjusted R square, suggesting multicollinearity. In other
words, the newly added variable is a linear combination of existing variable and thus does not bring
new relevant information in explaining the variation of the dependent variable (ln ⁡(revenue)). It is
not a good idea to add this variable, because it also introduces extra noise which makes the
regression and estimated coefficients less precise. This can be seen by the increased regression
standard error reported in the above table.

Page 13 of 13

Вам также может понравиться