Вы находитесь на странице: 1из 46

Hand-out # 1

To construct histograms:
1. Data are first organised into a table which arranges the data into class intervals (also called
bins) subdivisions of the total range of values which the variable takes.
In principle, bins do not have to be of equal width, but for simplicity; use equal width
wherever possible.
As a guide, six or seven bins should be sufficient, but remember to exercise common sense.
2. To each class interval, the corresponding frequency is determined, that is the number of
observations of the variable which falls in each interval.
3. Make two more columns for frequency density (frequency/class width) and cumulative
frequency.
Note the final column is not required for a histogram per se, although computation of
cumulative frequencies may be useful when determining medians and quartiles (to be
discussed later in this chapter).
4. Adjacent bars are drawn over the respective class intervals such that the area of each bar is
proportional to the interval frequency. This explains why equal bin widths are desirable
since this reduces the problem to making the heights proportional to the interval frequency.
However, you may be told to use a particular number of bins or bin widths, such that bins
will not all be of equal width. In such cases, you will need to compute the frequency density
as outlined above.
Key points to note:
- All bars are centred on the midpoints of each class interval.
-

Informative labels on the histogram, i.e. title and axis labels, must be provided!

Because area represents frequency, it follows that the dimension of bar heights is number
per unit class interval, hence the y-axis should be labelled frequency density rather than
frequency.

Must be drawn in PEN on a graph paper

Zone A, 2011 (solutions are overleaf)

Choice of the stem involves determining a major component of a typical data item, for
example the 10s unit, or if data are of the form 1.4, 1.8, 2.1, 2.9 . . ., then the integer part
would be appropriate.

The remainder of the data value plays the role of leaf. A leaf is always a single digit!
Applied to the weekly production dataset, we obtain the stem-and-leaf diagram below.

Key:
45 | 4 = 453

Note the following points:


- These stems are equivalent to using the (discrete)
class intervals 350 359, 360 369, 370 379
-Leaves are vertically aligned.
- A key MUST be provided
-The leaves are placed in order of magnitude within
the stems therefore its a good idea to sort the
raw data into ascending order first of all.
;
(
!)
-Unlike the histogram, the actual data values are
preserved. This is advantageous if we want to
calculate various (descriptive or summary) statistics.
- Note the informative title and labels for the stem
and leaf.
3

Zone B, 2011

Solution

Hand-out # 2

In a box plot, the middle horizontal line is the median and the upper and lower ends of the
box are the upper and lower quartiles, respectively.

The whiskers are drawn from the quartiles to the observations furthest from the median,
but not by more than one-and-a-half times the IQR (i.e. excluding outliers).

The whiskers are terminated by horizontal lines.

Any extreme points beyond the whiskers are plotted individually.

An example of a (generic) box plot is given below.

Zone A, 2013 (solution is provided overleaf)

Hand-out # 3

Question 5 Zone A, 2013

Solution 1

Solution 2

Solution 3

Solution 4

Solution 5

Hand-out # 4

Question 3

Question 4

Question 5

10

Solution 1

Solution 2

Solution 3

11

Hand-out # 5

Question 4

Question 5

12

Solution 1

Solution 2

Solution 3

13

Solution 4

Solution 5

14

Hand-out # 6
Suppose a simple random sample of 50 households is taken from a population of 1,000 households
in an area of a town. The sample mean and standard deviation of weekly expenditure on alcoholic
beverages are 18 and 4, respectively.
How many more households should you sample if it is required that your final estimate should have
a standard error less than 0.19?

8)

Suppose the reaction time of a patient to a certain stimulus is known to have a standard deviation of
0.05 seconds.
How large a sample of measurements must a psychologist take in order to be
a) 95% and
b) 99% confident that the error in his estimate of the mean reaction time will not exceed 0.01
seconds?

15

Hand-out # 7
Paired-sample methods are used in special cases when the two samples are not statistically
independent. For our purposes, such paired data are likely to involve observations on the same
individuals in two different states specifically before and after some intervening event.
A paired-sample experimental design is advantageous since it allows researchers to determine
whether or not significant changes have occurred as a result of the intervening event free from bias
from other factors since these have been controlled for by observing the same individuals.
A necessary, but not sufficient, indicator for the presence of paired sample data is that n1 = n2, in
order to have pairs of data values.
This scenario is easy to analyse as the paired data can simply be reduced to a one sample analysis
by working with differenced data. That is, suppose two samples generated sample values x1, x2, . . . ,
xn and y1, y2, . . . , yn respectively (note the same number of observations, n, in each sample).
Compute the differences, i.e. d1 = x1 y1, d2 = x2 y2, . . . , dn = xn yn.
By using the differences to compute a confidence interval for d, then we get the required
confidence interval for x y.

16

Hand-out # 8

Question 4

17

Solution 1

Solution 2

Solution 3

Solution 4

18

Hand-out # 9
We choose between two statements about the value of a parameter based on evidence obtained
from sample data.
Our objective is to choose between these two conflicting statements about the population, where
these statements are known as hypotheses. By convention these are denoted by H0 and H1.
The null hypothesis, H0, will always denote the parameter value with equality (=) H0 : = 0.
In contrast the alternative hypothesis, H1, will take one of three forms, i.e. using , <, or >, that is
H1 : 0 or H1 : < 0 or H1 : > 0.
Note that only one of these forms will be used per test.
H1 : 0 two tailed test use /2
H1 : < 0 lower-tailed (one-sided) test use with a negative sign
H1 : > 0 upper-tailed (one-sided) test use with a positive sign
Always assume the null hypothesis, H0, is true working hypothesis
Type I error: Rejecting H0 when it is true. This can be thought of as a false positive. Denote the
probability of this type of error by .
Type II error: Failing to reject H0 when it is false. This can be thought of as a false negative. Denote
the probability of this type of error by .

Steps of conducting a hypothesis test


1. Define the hypotheses.
2. State test statistic and compute its value.

3. Define critical region for given significance level, .


4. Choose hypothesis.

:
reject null hypothesis if; test statistic > critical value
19

reject null hypothesis if; test statistic < critical value


reject null hypothesis if; test statistic > + critical value or
test statistic < - critical value
:
- !
5. Retest at appropriate levels.

6. Draw conclusions.

P-value

)
100
1570
120

1600

20

21

(solution overleaf)

22

23

Hand-out # 10

Question 3

Question 4

Question 5 - Zone A, 2013

24

Solution 1

Solution 2

Solution 3

Solution 4

25

Solution 5

26

Hand-out # 11

This type of test, tests the null hypothesis that two factors (or attributes) are not associated, against
the alternative hypothesis that they are associated.
Each data unit we sample has one level (or type or variety) of each factor.
Suppose that we are sampling people, and that one factor of interest is hair colour (blonde, brown,
black, etc.) while another factor of interest is eye colour (blue, brown, green, etc.).
We wish to test whether or not these factors are associated.
Hence,
H0 : No association between hair colour and eye colour.
H1 : There is association between hair colour and eye colour.
So under H0 the distribution of eye colour is the same for blondes as it is for brunettes etc., whereas
if H1 is true it may be attributable to blonde-haired people having a (significantly) higher proportion
of blue eyes, say.
In three areas of a city a record has been kept of the numbers of burglaries, robberies and car thefts
that take place in a year. The total number of offences was 150, and they were divided into the
various categories as shown in the following contingency table:

The cell frequencies are known as observed frequencies.


1:
2:

:
[(row i total)(row j total)]
=
grand total

27

3:

):

4:
-

= =3
= 0 01

( 1) ( 1) = ( 3 1) ( 3 1) = 4
13 277
28

)
( 1) ( 1)

1%

23 13 > 13 277
:

- =1
0

- 0

5
= 1
- -

(Solution overleaf)
29

30

Hand-out # 12

31

Solution 1

Solution 2

Solution 3

32

33

Hand-out # 13
Correlation (

) and regression (
) enable us to see the connection between the actual dimensions of

two or more variables.

34

The sample correlation coefficient is calculated using;

35

1
1
(

) :
( )
( )

1)
(

1)

36

:
(
(

)
:

=+

)
:

(
)

10 000

37

0 = 10

000
10 000!

Hand-out # 14

38

Soultion 1

Solution 2

39

40

41

42

43

44

45

46

Вам также может понравиться